python之http.client模块读取分块数据的bug 作者:马育民 • 2019-01-12 22:39 • 阅读:10104 # bug说明 客户端向web服务器发请求时,web服务器会分块返回数据,在每一块的第一行指明该块的长度(16进制) 参见 [HTTP协议](http://www.malaoshi.top/show_1EFIyjQT04v.html "HTTP协议")的```Transfer-Encoding```报头 一般web服务器,将所有块都发送完毕时,会发送0,表示没有数据了,如下(第17行的0): ``` HTTP/1.1 200 OK Content-Type: text/plain Transfer-Encoding: chunked 25\r\n This is the data in the first chunk\r\n 1C\r\n and this is the second one\r\n 3\r\n con\r\n 8\r\n sequence\r\n 0\r\n \r\n ``` 但有一些网站,如https://www.oschina.net/ (Tengine服务器),当没有块时,发送空字节,此时http.client模块会报错,代码如下: ``` import http.client domain='www.oschina.net' url='/' try: conn=http.client.HTTPSConnection(domain) conn.request("GET",url) resp = conn.getresponse() print(resp.status, resp.reason,resp.version) print(resp.headers) if resp.status==200: data = resp.read() contenttype=resp.getheader('Content-Type') if contenttype.find('charset=')>=0: charset=contenttype.split('charset=')[1] else: charset='utf-8' print( data.decode(charset)) except : conn.close() raise ``` 错误如下: ``` File "c:/Users/mym/Desktop/python/http/client.py", line 572, in _readall_chunked raise IncompleteRead(b''.join(value)) __main__.IncompleteRead: IncompleteRead(81517 bytes read) ``` # bug解决 报错的原因是http.client模块没有判断空行,解决bug方法如下: 在http.client.py文件的```_read_next_chunk_size```函数(大约在506行),修改如下,增加第4、第5行代码: ``` def _read_next_chunk_size(self): # Read the next chunk size from the file line = self.fp.readline(_MAXLINE + 1) if line==b'': #判断当为空时就返回0 return 0 if len(line) > _MAXLINE: raise LineTooLong("chunk size") i = line.find(b";") if i >= 0: line = line[:i] # strip chunk-extensions try: return int(line, 16) except ValueError: # close the connection as protocol synchronisation is # probably lost self._close_conn() raise ``` 原文出处:http://www.malaoshi.top/show_1EF2bG0BoJJg.html