使用LibCurl库的一些疑问

xdayong 2020-04-16 11:05:46

我使用libCurl的Pop3功能，连上邮箱后，获取邮箱列表，但是在CURLOPT_WRITEFUNCTION的回调函数里，内存指向的数据正确，但是获取的长度跟实际长度不相符，例如我的邮箱里有2封未读邮件，回调函数里，指向的内存数据为：31 20 33 36 36 34 0d 0a 32 20 31 38 35 38 0d 0a 2e 0d 0a（1 3664 和 2 1858），但是回调函数里第二个参数和第三个参数分别是 1和6 ，按照官方给出的方法，计算出来的数据长度是1*6=6 ，但实际有效长度应该是19（如果除去CRLF和'.'，那么实际长度应该是12）

跟踪curl的源码，发觉在Curl_pop3_write里面，获取的实际长度也是19，但是在解析过程中，发现只解析到第一个CRLF就开始使用回调，造成获取的参数不对。

由于对pop3的协议不是很了解，看这个代码也不知道该如何修改。希望有知道的人可以指点一下。

下面附上Curl解析数据时，调用上层回调的源码，它是通过Curl_client_write来调用回调函数的

CURLcode Curl_pop3_write(struct connectdata* conn, char* str, size_t nread)

{

	/* This code could be made into a special function in the handler struct */

	CURLcode result = CURLE_OK;

	struct Curl_easy* data = conn->data;

	struct SingleRequest* k = &data->req;



	struct pop3_conn* pop3c = &conn->proto.pop3c;

	bool strip_dot = FALSE;

	size_t last = 0;

	size_t i;



	/* Search through the buffer looking for the end-of-body marker which is

	   5 bytes (0d 0a 2e 0d 0a). Note that a line starting with a dot matches

	   the eob so the server will have prefixed it with an extra dot which we

	   need to strip out. Additionally the marker could of course be spread out

	   over 5 different data chunks. */

	for (i = 0; i < nread-1; i++) {

		size_t prev = pop3c->eob;



		switch (str[i]) {

		case 0x0d:

			if (pop3c->eob == 0) {

				pop3c->eob++;



				if (i) {

					/* Write out the body part that didn't match */

					result = Curl_client_write(conn, CLIENTWRITE_BODY, &str[last],

						i - last);



					if (result)

						return result;



					last = i;

				}

			}

			else if (pop3c->eob == 3)

				pop3c->eob++;

			else

				/* If the character match wasn't at position 0 or 3 then restart the

				   pattern matching */

				pop3c->eob = 1;

			break;



		case 0x0a:

			if (pop3c->eob == 1 || pop3c->eob == 4)

				pop3c->eob++;

			else

				/* If the character match wasn't at position 1 or 4 then start the

				   search again */

				pop3c->eob = 0;

			break;



		case 0x2e:

			if (pop3c->eob == 2)

				pop3c->eob++;

			else if (pop3c->eob == 3) {

				/* We have an extra dot after the CRLF which we need to strip off */

				strip_dot = TRUE;

				pop3c->eob = 0;

			}

			else

				/* If the character match wasn't at position 2 then start the search

				   again */

				pop3c->eob = 0;

			break;



		default:

			pop3c->eob = 0;

			break;

		}



		/* Did we have a partial match which has subsequently failed? */

		if (prev && prev >= pop3c->eob) {

			/* Strip can only be non-zero for the very first mismatch after CRLF

			   and then both prev and strip are equal and nothing will be output

			   below */

			while (prev && pop3c->strip) {

				prev--;

				pop3c->strip--;

			}



			if (prev) {

				/* If the partial match was the CRLF and dot then only write the CRLF

				   as the server would have inserted the dot */

				result = Curl_client_write(conn, CLIENTWRITE_BODY, (char*)POP3_EOB,

					strip_dot ? prev - 1 : prev);



				if (result)

					return result;



				last = i;

				strip_dot = FALSE;

			}

		}

	}



	if (pop3c->eob == POP3_EOB_LEN) {

		/* We have a full match so the transfer is done, however we must transfer

		the CRLF at the start of the EOB as this is considered to be part of the

		message as per RFC-1939, sect. 3 */

		result = Curl_client_write(conn, CLIENTWRITE_BODY, (char*)POP3_EOB, 2);



		k->keepon &= ~KEEP_RECV;

		pop3c->eob = 0;



		return result;

	}



	if (pop3c->eob)

		/* While EOB is matching nothing should be output */

		return CURLE_OK;



	if (nread - last) {

		result = Curl_client_write(conn, CLIENTWRITE_BODY, &str[last],

			nread - last);

	}



	return result;

}

...全文

153 6 打赏收藏转发到动态举报

写回复

用AI写文章

6 条回复

切换为时间正序

请发表友善的回复…

发表回复

_mervyn 2020-04-16

打赏
举报

引用 3 楼 xdayong 的回复:

感谢楼上的回复，我是以多次回调，缓存数据方式来执行的，并且每次我的返回值都为nmemb。但问题是，但现在是，即使以这种方式运行，我获取的信息，仍旧不对，我回调里获取的3个参数 char *ptr, size_t size, size_t nmemb 第一是内存指针，指向的有效空间大小为19（这是我跟踪源码得到了），第二个数为1 ，第三个数为6 。所以，我缓存的，也只是6个字节而已。我这里的主要问题是，无法在回调函数里，确定实际有效数据的大小

那应该就没问题了吧，你收到6个字节是什么时候，第几次回调，后面不会再回调了吗？你实际的问题是啥？为什么要纠结于这个呢？虽然我对pop3协议不了解，但是crlf 就是一条命令的结束，再往后也许是垃圾数据，也许是下一个命令。我猜你遇到的问题可能不在这里？你先别管这个，你在curl_easy_perform之后，打印出你收到的回复，是内容有缺失吗？

xdayong 2020-04-16

打赏
举报

感谢楼上的回复，我是以多次回调，缓存数据方式来执行的，并且每次我的返回值都为nmemb。但问题是，但现在是，即使以这种方式运行，我获取的信息，仍旧不对，我回调里获取的3个参数 char *ptr, size_t size, size_t nmemb 第一是内存指针，指向的有效空间大小为19（这是我跟踪源码得到了），第二个数为1 ，第三个数为6 。所以，我缓存的，也只是6个字节而已。我这里的主要问题是，无法在回调函数里，确定实际有效数据的大小

_mervyn 2020-04-16

打赏
举报

引用 1 楼 _mervyn 的回复:

这个回调每次被调用时，都会尽可能多的将当前现有的数据都回调出来。你不能对它有任何假设。它有可能只有1字节，也可能上千。

修改一下我说的有歧义的地方。 curl不是每次都回调所有的累加总和数据。而是每次能收到多少，每次就回调多少。就是说，参数ptr指向的内容开头都是不一样的。

引用 1 楼 _mervyn 的回复:

看了上面这些，你应该明白，你实现这个回调的时候，不能认为这个回调只会被调用一次，所以你应该每次都将数据存起来（每次将得到的数据append进你的数据结构）。

_mervyn 2020-04-16

打赏
举报

官网对于写回调的介绍： https://curl.haxx.se/libcurl/c/CURLOPT_WRITEFUNCTION.html 这里我摘录一下比较重要的几个点： 这个回调将在一有数据时就回调上层，因此它有可能被调用很多次，每次被调用时，nmemb表示这次收到了多少数据，size总是1： This callback function gets called by libcurl as soon as there is data received that needs to be saved. For most transfers, this callback gets called many times and each invoke delivers another chunk of data. ptr points to the delivered data, and the size of that data is nmemb; size is always 1. 这个回调每次被调用时，都会尽可能多的将当前现有的数据都回调出来。你不能对它有任何假设。它有可能只有1字节，也可能上千。 The callback function will be passed as much data as possible in all invokes, but you must not make any assumptions. It may be one byte, it may be thousands. 这个回调有可能被调用时传0字节上来，这代表传输的数据结束了（表示后面不会再回调了，应该是这个意思）。另外，回调上来的数据不是以'\0'结尾的。 This function may be called with zero bytes data if the transferred file is empty. The data passed to this function will not be zero terminated! 看了上面这些，你应该明白，你实现这个回调的时候，不能认为这个回调只会被调用一次，所以你应该每次都将数据存起来（每次将得到的数据append进你的数据结构）。你应该还剩一个问题，就是你应该给这个回调返回什么值。你应该返回你每次实际存了多少字节。如果你返回的字节数不等于curl给你回调上来的字节数，那么就会中断后续的流程，curl的curl_easy_perform应该会返回CURLE_WRITE_ERROR。 Your callback should return the number of bytes actually taken care of. If that amount differs from the amount passed to your callback function, it'll signal an error condition to the library. This will cause the transfer to get aborted and the libcurl function used will return CURLE_WRITE_ERROR.

_mervyn 2020-04-16

打赏
举报

引用 5 楼 xdayong 的回复:

回复的内容是没有缺失的，同时CRLF不一定是一条命令的结束但是，但是，既然是多次回调，那么，我每次都该收多少字节数据，以什么来作为判断标志位，如果我不知道这个，那么我如何组织数据呢。举个例子，第一次我收到30字节，第二次我收到40字节，那么，在第一次调用的时候，我就无法确定我该保存多少数据到缓冲区里面谢谢楼上回答，先结帖，我再找找看还有什么地方有错误的

一应一答式的呀，假设你每次用curl_easy_perform请求后，curl_easy_perform阻塞，针对该请求的回复会在你设置的回调里源源不断给你，你不需要知道什么时候结束，你只管收数据。直到curl_easy_perform返回成功，你的数据就算接收完毕了（针对这一次请求的回复），你的回调就不会再被调用，当然直到你下一次再用curl_easy_perform发送另个请求。

xdayong 2020-04-16

打赏
举报

回复的内容是没有缺失的，同时CRLF不一定是一条命令的结束但是，但是，既然是多次回调，那么，我每次都该收多少字节数据，以什么来作为判断标志位，如果我不知道这个，那么我如何组织数据呢。举个例子，第一次我收到30字节，第二次我收到40字节，那么，在第一次调用的时候，我就无法确定我该保存多少数据到缓冲区里面谢谢楼上回答，先结帖，我再找找看还有什么地方有错误的