boost::regex

s5unty 2006-10-20 03:33:55

已知字符串"google.com baidu.com"，约定使用空格分割各域名，但域名的个数未知。

使用表达式1"(\\w+)"，只能获得google
使用表达式2"(\\w+\\.\\w+)"，只能获得google.com

我想依次获得google.com和baidu.com，在不知道baidu.com后面还有没有other.com的前提下，利用一个表达式，有可能办到吗？

对于已知字符串，我还要对各个域名进行验证，验证内容包括不允许两个连续的dot以及不允许没有dot。
===========
#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main( ) {
boost::regex reg("??表达式怎么写，谢谢??", regbase::extended | regbase::icase);
std::string str = "google.com baidu.com";
boost::smatch matches;

string::const_iterator it = str.begin();
string::const_iterator end = str.end();
if (boost::regex_search(it, end, matches, reg)) { <== 你的表达式能否保证：如果str中一个域名不能通过验证，返回false。全部通过验证返回true。
for (int i = 1; i < matches.size(); ++i) { <== 你的表达式能否保证：如果str中所有域名都通过验证，这里可以依次获得每一个域名
cout << matches[i] << endl;
}
}

return 0;
}

如果你有更好的方法，请赐教。谢谢！！

...全文

260 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

s5unty 2006-11-02

打赏
举报

其中使用这个表达式"\\w+\\.[^\\s]+"，就可以获得各个域名了

这句写错了，要用括号包含表达式中我们所需的部分。所以获得各个域名的表达式应该这样写
"(\\w+\\.[^\\s]+)"

s5unty 2006-11-01

打赏
举报

今天终于让我解决了这个问题，原来我是期望用boost::smatch返回各个域名，结果证明我是错的。

我后来使用regex_split就解决这个问题了，代码是这样的：
#include <iostream>
#include <iterator>
#include <string>
#include <vector>
#include <boost/regex.hpp>

using namespace std;

int main(int argc, const char* argv[]) {
if (argc < 2) {
cout << "usage: ./a.out \"REGEX\"" << endl;
return -1;
}

string exp = argv[1];
string str = "search google.com www.baidu.com";
boost::regex reg(exp, boost::regbase::icase);

// section 1. we need.
vector<string> what;
boost::regex_split(back_inserter(what), str, reg);
copy(what.begin(), what.end(), ostream_iterator<string>(cout, "\n"));

/*
// section 2. invalid.
boost::smatch what;
if (!boost::regex_search(str, what, reg)) {
cout << "not found" << endl;
return 0;
}

copy(what.begin(), what.end(), ostream_iterator<string>(cout, "\n"));
*/
return 0;
}

其中使用这个表达式"\\w+\\.[^\\s]+"，就可以获得各个域名了

s5unty 2006-10-20

打赏
举报

嗯，多谢sunman1982让我了解了tokenizer，用它来分割字符串真的很方便，但是可能在
字符串替换方面，用regex_replace会比用tokenizer方便

tony1978写的大括号好像有问题，大括号是用来修饰匹配次数的。

再次感谢两位，特别是太阳系之外的那位 :)

sunman1982 2006-10-20

打赏
举报

E4800%make
g++ -I/usr/openwin/include -I/usr/include -I./include -I. -I. -c token.cc

Linking aaa .........
done

E4800%aaa
google.com
baidu.com
163.com

sunman1982 2006-10-20

打赏
举报

vi token.cc
"token.cc" 18 ÐÐ£¬444 ×Ö·û
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>
#include "boost/lambda/lambda.hpp"
#include <boost/tokenizer.hpp>

using namespace std;
using namespace boost::lambda;

int main()
{
typedef boost::tokenizer<boost::char_separator<char> > Tok;
boost::char_separator<char>sep("'\t'");
string str = "google.com baidu.com 163.com";
Tok tok(str,sep);
for_each(tok.begin(),tok.end(),cout<<_1<<constant('\n'));
}