社区
脚本语言
帖子详情
lua,bytes怎么换成成string,使用string.char(xx)循环除外
票票飞扬
2009-07-29 02:12:10
lua,bytes怎么换成成string,使用string.char(xx)循环除外
...全文
1193
3
打赏
收藏
lua,bytes怎么换成成string,使用string.char(xx)循环除外
lua,bytes怎么换成成string,使用string.char(xx)循环除外
复制链接
扫一扫
分享
转发到动态
举报
写回复
配置赞助广告
用AI写文章
3 条
回复
切换为时间正序
请发表友善的回复…
发表回复
打赏红包
fibbery
2009-07-30
打赏
举报
回复
[Quote=引用 2 楼 genphone_ru 的回复:]
解决了
local str = arrbuf:tvb():string
[/Quote]
那恭喜了!
票票飞扬
2009-07-30
打赏
举报
回复
解决了
local str = arrbuf:tvb():string
fibbery
2009-07-29
打赏
举报
回复
http://tech.it168.com/jd/2008-02-17/200802171009669.shtml
看看对你是否有用。
Lua
Unicode icu-
lua
来源: http://
lua
-users.org/wiki/
Lua
Unicode 目录: |
Lua
Unicode.url | +---0.13A | ICU4
Lua
-0.13A-src.zip | ICU4
Lua
-0.13A-win32-dll.zip | \---0.2B ICU4
Lua
-0.2B-docs.zip ICU4
Lua
-0.2B-src.zip ICU4
Lua
-0.2B-win32dll.zip 下面的来源于: http://
lua
-users.org/wiki/
Lua
Unicode This is an attempt to answer the
Lua
Faq : Can I use unicode
string
s? or Does
Lua
support unicode? In short, yes and no.
Lua
gives you the bare bones support and enough rope and not much else. Unicode is a large and complex standard and questions like "does
lua
support unicode" are extremely vague. Some of the issues are: Can I store and retrieve Unicode
string
s? Can my
Lua
programs be written in Unicode? Can I compare Unicode
string
s for equality? Sorting
string
s. Pattern matching. Can I determine the length of a Unicode
string
? Support for bracket matching, bidirectional printing, arbitrary composition of
char
acters, and other issues that arise in high quality typesetting.
Lua
string
s are fully 8-bit clean, so simple uses are supported (like storing and retrieving), but there's no built in support for more sophisticated uses. For a fuller story, see below. Unicode
string
s and
Lua
string
s A
Lua
string
is an aribitrary sequence of values which have at least 8 bits (octets); they map directly into the
char
type of the C compiler. (This may be wider than eight bits, but eight bits are guaranteed.)
Lua
does not reserve any value, including NUL. That means that you can store a UTF-8
string
in
Lua
without problems. Note that UTF-8 is just one option for storing Unicode
string
s. There are many other encoding schemes, including UTF-16 and UTF-32 and their various big-endian/little-endian variants. However, all of these are simply sequences of octets and can be stored in a
Lua
string
without problems. Input and output of
string
s in
Lua
(using the io library) uses C's stdio library. ANSI C does not require the stdio library to handle arbitrary octet sequences unless the file is opened in binary mode; furthermore, in non-binary mode, some octet sequences are converted into other ones (in order to deal with varying end-of-line markers on different platforms). This may affect your ability to do non-binary file input and output of Unicode
string
s in formats other than UTF-8. UTF-8
string
s will probably be safe because UTF-8 does not use control
char
acters such as \n and \r as part of multi-octet encodings. However, there are no guarantees; if you need to be certain, you must use binary mode input and output. (If you do so, line-endings will not be converted.) Unix file IO has been 8-bit clean for a long while. If you are not concerned with portability and are only using Unix and Unix-like operating systems, you can almost certainly not worry about the above. If your use of Unicode is restricted to passing the
string
s to external libraries which support Unicode, you should be OK. For example, you should be able to extract a Unicode
string
from a database and pass it to a Unicode-aware graphics library. But see the sections below on pattern matching and
string
equality. Unicode
Lua
programs Literal Unicode
string
s can appear in your
lua
programs. Either a UTF-8 encoded
string
can appear directly with 8-bit
char
acters or you can use the \ddd syntax (note that ddd is a decimal number, unlike some other languages). However, there is no facility for encoding multi-octet sequences (such as \U+20B4); you would need to either manually encode them to UTF-8, or insert individual octets in the correct big-endian/little-endian order (for UTF-16 or UTF-32). Unless you are using an operating system in which a
char
is more than eight bits wide, you will not be able to use arbitrary Unicode
char
acters in
Lua
identifers (for the names of variables and so on). You may be able to use eight-bit
char
acters outside of the ANSI range.
Lua
uses the C functions isalpha and isalnum to identify valid
char
acters in identifiers, so it will depend on the current locale. To be honest, using
char
acters outside of the ANSI range in
Lua
identifiers is not a good idea, since your programs will not compile in the standard C locale. Comparison and Sorting
Lua
string
comparison (using the == operator) is done
byte
-by-
byte
. That means that == can only be used to compare Unicode
string
s for equality if the
string
s have been normalized in one of the four Unicode normalizations. (See the [Unicode FAQ on normalization] for details.) The standard
Lua
library does not provide any facility for normalizing Unicode
string
s. Consequently, non-normalized Unicode
string
s cannot be reliably used as table keys. If you want to use the Unicode notion of
string
equality, or use Unicode
string
s as table keys, and you cannot guarantee that your
string
s are normalized, then you'll have to write or find a normalization function and use that; this is non-trivial exercise! The
Lua
comparison operators on
string
s (< and <=) use the C function strcoll which is locale dependent. This means that two
string
s can compare in different ways according to what the current locale is. For example,
string
s will compare differently when using Spanish Traditional sorting to that when using Welsh sorting. It may be that your operating system has a locale that implements the sorting algorithm that you want, in which case you can just use that, otherwise you will have to write a function to sort Unicode
string
s. This is an even more non-trivial exercise. UTF-8 was designed so that a naive octet-by-octet
string
comparison of an octet sequence would produce the same result if a naive octet-by-octet
string
comparison were done on the UTF-8 encoding of the octet sequence. This is also true of UTF-32BE but I do not know of any system which uses that encoding. Unfortunately, naive octet-by-octet comparison is not the collation order used by any language. (Note: sometimes people use the terms UCS-2 and UCS-4 for "two-
byte
" and four-
byte
encodings. These are not Unicode standards; they come from the closely corresponding ISO standard ISO/IEC 10646-1:2000 and currently differ in that they allow codes outside of the Unicode range, which runs from 0x0 to 0x10FFFF.) Pattern Matching
Lua
's pattern matching facilities work
char
acter by
char
acter. In general, this will not work for Unicode pattern matching, although some things will work as you want. For example, "%u" will not match all Unicode upper case letters. You can match individual Unicode
char
acters in a normalized Unicode
string
, but you might want to worry about combining
char
acter sequences. If there are no following combining
char
acters, "a" will match only the letter a in a UTF-8
string
. In UTF-16LE you could match "a%z". (Remember that you cannot use \0 in a
Lua
pattern.) Length and
string
indexing If you want to know the length of a Unicode
string
there are different answers you might want according to the circumstances. If you just want to know how many
byte
s the
string
occupies, so that you can make space for copying it into a buffer for example, then the existing
Lua
function
string
.len will work. You might want to know how many Unicode
char
acters are in a
string
. Depending on the encoding used, a single Unicode
char
acter may occupy up to four
byte
s. Only UTF-32LE and UTF-32BE are constant length encodings (four
byte
s per
char
acter); UTF-32 is mostly a constant length encoding but the first element in a UTF-32 sequence should be a "
Byte
Order Mark", which does not count as a
char
acter. (UTF-32 and variants are part of Unicode with the latest version, Unicode 4.0.) Some implementations of UTF-16 assume that all
char
acters are two
byte
s long, but this has not been true since Unicode version 3.0. Happily UTF-8 is designed so that it is relatively easy to count the number of unicode symbols in a
string
: simply count the number of octets that are in the ranges 0x00 to 0x7f (inclusive) or 0xC2 to 0xF4 (inclusive). (In decimal, 0-127 and 194-244.) These are the codes which can start a UTF-8
char
acter code. Octets 0xC0, 0xC1 and 0xF5 to 0xFF (192, 193 and 245-255) cannot appear in a conforming UTF-8 sequence; octets in the range 0x80 to 0xBF (128-191) can only appear in the second and subsequent octets of a multi-octet encoding. Remember that you cannot use \0 in a
Lua
pattern. For example, you could use the following code snippet to count UTF-8
char
acters in a
string
you knew to be conforming (it will incorrectly count some invalid
char
acters): local _, count =
string
.gsub(unicode_
string
, "[^\128-\193]", "") If you want to know how many printing columns a Unicode
string
will occupy when you print it out using a fixed-width font (imagine you are writing something like the Unix ls program that formats its output into several columns), then that is a different answer again. That's because some Unicode
char
acters do not have a printing width, while others are double-width
char
acters. Combining
char
acters are used to add accents to other letters, and generally they do not take up any extra space when printed. So that's at least 3 different notions of length that you might want at different times.
Lua
provides one of them (
string
.len) the others you'll need to write functions for. There's a similar issue with indexing the
char
acters of a
string
by position.
string
.sub(s, -3) will return the last 3
byte
s of the
string
which is not necessarily the same as the last three
char
acters of the
string
, and may or may not be a complete code. You could use the following code snippet to iterate over UTF-8 sequences (this will simply skip over most invalid codes): for u
char
in
string
.gfind(u
string
, "([%z\1-\127\194-\244][\128-\191]*)") do -- something end More sophisticated issues As you might have guessed by now,
Lua
provides no support for things like bidirectional printing or the proper formatting of Thai accents. Normally such things will be taken care of by a graphics or typography library. It would of course be possible to interface to such a library that did these things if you had access to one. There is a little
string
-like package [slnunicode] with upper/lower, len/sub and pattern matching for UTF-8. See ValidateUnicode
String
for a smaller library. [ICU4
Lua
] is a
Lua
binding to ICU (International Components for Unicode [1]), an open-source library originally developed by IBM. See UnicodeIdentifers for platform independent Unicode
Lua
programs.
Lua
中的
string
库(字符串函数库)总结
Lua
解释器对字符串的支持很有限。一个程序可以创建字符串并连接字符串,但不能截取子串,检查字符串的大小,检测字符串的内容。在
Lua
中操纵字符串的功能基本来自于
string
库。 字符串库中的一些函数是非常简单的:
string
.len(s) 返回字符串s的长度;
string
.rep(s, n) 返回重复n次字符串s的串;你
使用
string
.rep("a", 2^2
lua
之
byte
s(str) 与 hex
string
相互转换
lua
脚本类型转换之
byte
s与hex
string
Lua
String
库(标准库相关)
String
库
Lua
解释器对字符串的支持很有限。一个程序可以创建字符串并连接字符串,但不能截取子串,检查字符串的大小,检测字符串的内容。在
Lua
中操纵字符串的功能基本来自于
string
库。
String
库中的一些函数是非常简单的:
string
.len(s)返回字符串 s 的长度;
string
.rep(s, n)返回重复 n 次字符串 s 的串;你
使用
string
.rep(“a”,...
LUA
之
string
的
使用
--
string
.len(s) --返回字符串s的长度 --
string
.rep(s, n) --返回重复n次字符串s的串,你
使用
string
.rep("a", 2^20)可以创建一个1M
byte
s的字符串(比如,为了测试需要); --
string
.lower(s) --将s中的大写字母转
换
成
小写(
string
.upper将小写转
换
成
大写).如果你想不关
脚本语言
37,722
社区成员
34,238
社区内容
发帖
与我相关
我的任务
脚本语言
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
复制链接
扫一扫
分享
社区描述
JavaScript,VBScript,AngleScript,ActionScript,Shell,Perl,Ruby,Lua,Tcl,Scala,MaxScript 等脚本语言交流。
社区管理员
加入社区
获取链接或二维码
近7日
近30日
至今
加载中
查看更多榜单
试试用AI创作助手写篇文章吧
+ 用AI写文章