https://bugzilla.redhat.com/show_bug.cgi?id=1301928
Bug ID: 1301928 Summary: libxml2: out-of-bounds read in htmlParseNameComplex() Product: Security Response Component: vulnerability Keywords: Security Severity: medium Priority: medium Assignee: security-response-team@redhat.com Reporter: mprpic@redhat.com CC: athmanem@gmail.com, c.david86@gmail.com, erik-fedora@vanpienbroek.nl, fedora-mingw@lists.fedoraproject.org, ktietz@redhat.com, lfarkas@lfarkas.org, ohudlick@redhat.com, rjones@redhat.com, veillard@redhat.com
An out-of-bounds read flaw was reported in libxml2's htmlParseNameComplex() function:
http://seclists.org/oss-sec/2016/q1/199
A remote attacker could provide a specially crafted XML file that, when processed by an application linked against libxml2, could cause the application to disclose crash.
https://bugzilla.redhat.com/show_bug.cgi?id=1301928
Martin Prpic mprpic@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends On| |1301929 Depends On| |1301930 Depends On| |1301931
--- Comment #1 from Martin Prpic mprpic@redhat.com ---
Created libxml2 tracking bugs for this issue:
Affects: fedora-all [bug 1301929]
Referenced Bugs:
https://bugzilla.redhat.com/show_bug.cgi?id=1301929 [Bug 1301929] libxml2: out-of-bounds read in htmlParseNameComplex() [fedora-all] https://bugzilla.redhat.com/show_bug.cgi?id=1301930 [Bug 1301930] mingw-libxml2: libxml2: out-of-bounds read in htmlParseNameComplex() [fedora-all] https://bugzilla.redhat.com/show_bug.cgi?id=1301931 [Bug 1301931] mingw-libxml2: libxml2: out-of-bounds read in htmlParseNameComplex() [epel-7]
https://bugzilla.redhat.com/show_bug.cgi?id=1301928
--- Comment #2 from Martin Prpic mprpic@redhat.com ---
Created mingw-libxml2 tracking bugs for this issue:
Affects: fedora-all [bug 1301930] Affects: epel-7 [bug 1301931]
https://bugzilla.redhat.com/show_bug.cgi?id=1301928
Martin Prpic mprpic@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Blocks| |1301932
https://bugzilla.redhat.com/show_bug.cgi?id=1301928
Martin Prpic mprpic@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Alias| |CVE-2016-2073
https://bugzilla.redhat.com/show_bug.cgi?id=1301928
Martin Prpic mprpic@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|libxml2: out-of-bounds read |CVE-2016-2073 libxml2: |in htmlParseNameComplex() |out-of-bounds read in | |htmlParseNameComplex()
https://bugzilla.redhat.com/show_bug.cgi?id=1301928
Cedric Buissart cbuissar@redhat.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |cbuissar@redhat.com
--- Comment #3 from Cedric Buissart cbuissar@redhat.com --- Below is my current understanding of this issue (which, I believe, is identical to 1304636) :
The issue is when a word starts with normal ASCII chars and jumps to UTF multibytes chars.
The issue is in htmlParseNameComplex. More precisely, in the while{} loop. The following happens :
vars: len: the size of the word, in bytes. this is used to be able to get back to the begining of the word (i.e.: 'ctxt->input->cur - len') c : the current character (can be multibytes) l : the size in bytes of character c
The while loop will find the end of the word. The expectation is the following : 'ctxt->input->cur' points to the end of the word, and len contains the word's length in byte, thus ctxt->input->cur - len points to the beginning of the word.
htmlCurrentChar() is called during the process (via macro CUR_CHAR), and returns the next character (possibly multibytes) along with its size (l is updated). While in htmlCurrentChar(), if the character is multibytes/non-ASCII, this will lead to a change of encoding via the function xmlSwitchToEncodingInt(). During this switch, xmlBufShrink() is called. The purpose of this function is to remove the beginning of an XML buffer, via memmove. But this is done from the current character, not from the begining of the word. Thus, in the process, ctxt->input->cur will point to the begining of the string, and thus 'ctxt->input->cur - len' will point before the beginning of the string.
The number of bytes to be removed is based on the following calculation : [parserInternals.c:1202] processed = input->cur - input->base; In order to keep the current word, htmlParseNameComplex's 'len' value should be removed here too so that the shrinking stops at the begining of the word instead of the current character.
i.e.: my understanding is that xmlBufShrink() should not shrink beyond the beginning of the current word
==============
Breakpoint 2, xmlBufShrink__internal_alias (buf=0x602550, len=len@entry=14) at buf.c:386 386 xmlBufShrink(xmlBufPtr buf, size_t len) { (gdb) bt #0 xmlBufShrink__internal_alias (buf=0x602550, len=len@entry=14) at buf.c:386 #1 0x00007ffff7aa8764 in xmlSwitchInputEncodingInt (ctxt=0x6045b0, input=0x605b30, handler=0x6023e0, len=45) at parserInternals.c:1194 #2 0x00007ffff7aa9b59 in xmlSwitchToEncodingInt (len=<optimized out>, handler=<optimized out>, ctxt=0x6045b0) at parserInternals.c:1272 #3 xmlSwitchEncoding__internal_alias (ctxt=ctxt@entry=0x6045b0, enc=enc@entry=XML_CHAR_ENCODING_8859_1) at parserInternals.c:1100 #4 0x00007ffff7ae7425 in htmlCurrentChar (ctxt=0x6045b0, len=0x7fffffffdc54) at HTMLparser.c:518 #5 0x00007ffff7ae77d5 in htmlParseNameComplex (ctxt=0x6045b0) at HTMLparser.c:2515 #6 htmlParseName (ctxt=ctxt@entry=0x6045b0) at HTMLparser.c:2483 #7 0x00007ffff7aa0a73 in htmlParseDocTypeDecl (ctxt=ctxt@entry=0x6045b0) at HTMLparser.c:3398 #8 0x00007ffff7aed52d in htmlParseTryOrFinish (terminate=<optimized out>, ctxt=<optimized out>) at HTMLparser.c:5440 #9 htmlParseChunk__internal_alias (ctxt=0x6045b0, chunk=<optimized out>, size=<optimized out>, terminate=0) at HTMLparser.c:6070 #10 0x00000000004007f4 in main (argc=1, arg=0x7fffffffdec8) at foo.c:25
Before the shrink : (gdb) p *ctxt->input $9 = {buf = 0x602500, [...], base = 0x6025a0 "<!DOCTYPE html\342\t</</body></html>", cur = 0x6025ae "\342\t</</body></html>", end = 0x6025c1 "", length = 0, line = 1, col = 15, consumed = 0, [...]}
After the shrink: (gdb) p *ctxt->input $15 = {buf = 0x602500, [...], base = 0x6025a0 "\342\t</</body></html>", cur = 0x6025ae "tml>", end = 0x6025c1 "", length = 0, line = 1, col = 15, consumed = 0, [...]}
The final input, after readjustement : (gdb) p *ctxt->input $23 = {buf = 0x602500, [...], base = 0x605d50 "â\t</</body></html>", cur = 0x605d50 "â\t</</body></html>", end = 0x605d64 "", length = 0, line = 1, col = 15, consumed = 0, [...]}