On 09/08/2010 06:03 PM, Denys Vlasenko wrote:
On Tue, 2010-08-31 at 16:30 +0200, Karel Klic wrote:
The new code has been written as a separate library (see the attached archive), because it became pretty large, containing a binary not used by the rest of ABRT, many tests, and several helper scripts. Within ABRT, it's used only by the CCpp plugin. I am not sure how to integrate it. Should we include it (as a separate project in a subdirectory) to ABRT's git repository?
I propose just giving it its own subdirectory in the ABRT code. When some other project will want it factored out, then we surely can separate it (or let that other project do it).
This allows you to not roll your own strbuf, xmalloc and such.
There are only few small functions that could be re-used from ABRT. I would rather keep the btparser independent: used by ABRT, but not the other way round. The C code has no dependencies except libc. With some publicity and a few more features people might want to use it in various bug management tools etc.
Review (in no particular order):
/**
- Checks whether the frame represents a call of function with certain
- function name.
*/ bool btp_frame_eq0(struct btp_frame *frame, const char *function_name); ... bool btp_frame_eq1(struct btp_frame *frame, const char *function_name, const char *source_file);
These functions beg for better names. Maybe: btp_frame_calls_func(frame, "func") btp_frame_calls_func_in_file(frame, "func", "file.c")
Yes, I used the names you proposed.
struct btp_thread * btp_thread_dup(struct btp_thread *thread, bool shallow, bool siblings); struct btp_frame * btp_frame_dup(struct btp_frame *frame, bool shallow, bool siblings);
You are using shallow copies only once for every of these functions, because you copy an on-stack structures in these cases. If you'll allocate them via malloc, then you'll never need shallow copies and can drop that parameter and code.
Done.
thread.h ^^^^^^^^ The name is too generic. To forestall collisions, I propose btp_thread.h
Should the btp_ prefix be also added to other header files, for the sake of consistency?
External programs using the thread.h do that by #include <btparser/thread.h>, so the collision does not seem likely in this case. The library itself contains #include "thread.h", but the compiler is supposed to look to the current directory first so this should work well.
#ifdef __cplusplus extern "C" { #endif
#include<stdio.h> #include<stdbool.h> ^^^^^^^^ I think they are better moved up, above #ifdef __cplusplus. (actually, stdio.h is not needed at all here...)
Both issues fixed in all header files.
/* Declarations. */<------- can be safely removed
Yup. Removed from all header files.
char *name = btp_malloc(strlen(frame->function_name) - 3); strcpy(name, frame->function_name + 4); free(frame->function_name); frame->function_name = name;
You can just char *name = [x/btp_]strdup(frame->function_name + 4); free(frame->function_name); frame->function_name = name; or even strcpy(frame->function_name, frame->function_name + 4); sans the regretful fact that strcpy() is not safe on overlapped strings; thus either open-code a safe version: char *p = frame->function_name; while ((*p = p[4]) != '\0') p++; or create and use helper function overlapping_strcpy().
I used the while loop you suggested.
#define NORMALIZE_ARCH_SPECIFIC(func) \ if (btp_frame_eq2(frame, "__" func "_sse2", func ".S", "libc.so") || \ btp_frame_eq2(frame, "__" func "_ssse3", func ".S", "libc.so") || \ btp_frame_eq2(frame, "__" func "_ia32", func ".S", "libc.so")) \ { \ free(frame->function_name); \ frame->function_name = btp_strdup(func); \ }
You can just strcpy instead of free + strdup.
Yes, fixed.
normalize.c:
#include "normalize.h" #include "normalize_dbus.h" #include "normalize_gdk.h" #include "normalize_glib.h" #include "normalize_glibc.h" #include "normalize_libstdcpp.h" #include "normalize_linux.h" #include "normalize_xorg.h"
You will need to touch this file every time you add a new normalizer. You can reduce your future work by consolidating all normalize_foo.h files into one header (they are all very similar and simple, unlike corresponding .c files), thus you will need to create one less file and add one less #include per each new btp_normalize_foo_thread().
Agreed. I merged all those headers into normalize.h.
static char * findfirstabnul(char *input, const char *a, const char *b) { size_t alen = strlen(a); size_t blen = strlen(b); char *p = input; while (*p) { if (strncmp(p, a, alen) == 0) return p; if (strncmp(p, b, blen) == 0) return p; ++p; } return p; }
Looks like it is possible to use strcmp's instead of strncmp's and drop {ab}len and strlen calls
I do not understand how to do it. This function searches the input for prefix a or b. So only a part of input with the lenght of a/b is checked.
bool btp_parse_regexp(char **input, const char *pattern, char **result);
Can use a simpler API here:
/* Returns NULL on regexp error or no match */ char* btp_parse_regexp(char **input, const char *pattern);
It is a bit better because it looks a bit simpler in C, and is more efficient (old one takes addresses of pointers, forcing them to use stack, not CPU registers). The difference is small but positive on both fronts.
Agreed. As for this particular instance, the btp_parse_regexp function was removed.
Similarly:
bool btp_backtrace_parse(char **input, struct btp_backtrace **backtrace, struct btp_location *location)
can use a bit different API:
/* Returns NULL on error */ struct btp_backtrace* btp_backtrace_parse(char **input, struct btp_location *location)
The API was changed as you proposed.
Similarly for btp_thread_parse(), btp_frame_parse_file_location() etc...
Done.
static char * btp_get_regerror(int errcode, regex_t *compiled) { size_t length = regerror(errcode, compiled, NULL, 0); char *buffer = btp_malloc(length); (void) regerror(errcode, compiled, buffer, length); return buffer; } ... int regresult = regcomp(&compiled, pattern, REG_EXTENDED); if (regresult != 0) { char *error = btp_get_regerror(regresult,&compiled); fprintf(stderr, "btp: Error while compiling regexp: %s", error); free(error); ... regresult = regexec(&compiled, *input, 1,&matchptr, 0); if (regresult != 0) /* REG_NOMATCH or some error */ { if (regresult != REG_NOMATCH) { char *error = btp_get_regerror(regresult,&compiled); fprintf(stderr, "btp: Error while matching regexp: %s", error); free(error);
You do the same thing with btp_get_regerror result in all two cases: print and free. Why not move these ops *into* btp_get_regerror instead?
Makes sense, but I removed the regexp stuff as it ended up unused.
/**
- A location in the backtrace file with an attached message.
- It's used for error reporting: the line and the column points to
- the place where a parser error occurred, and the message explains
- what the parser expected and didn't find on that place.
*/ struct btp_location { /** Starts from 1. */ int line; /** Starts from 0. */ int column; /** * Error message related to the line and column. Do not release * the memory this pointer points to. */ const char *message; };
IIUC you are trying to get accurate error reporting, and for that you are recording the position while you consume input, and record message and return with it on error. End user uses this machinery like this:
btp_thread_parse(&input_char_ptr, ..., ptr_to_location_struct); if (ERROR) { printf("%d:%d: %s, run for your lives", location.line, location.column, location.message); }
The goal is sound, but the price you pay to achieve it seems to be high in terms of code simplicity and readability.
IMHO the added complexity is worth the benefits. The location struct is not really complex, and the column/row counting code even helped me to discover some syntax corner cases which were previously not handled.
I would like to make it simpler. How about (1) printing (logging) error message on the spot instead of propagating it back to caller? (2) passing two char* pointers down to callees:
btp_thread_parse(&input_char_ptr, ...): start = input_char_ptr; while (...) -> btp_thread_parse(start,&input_char_ptr, ....) -> btp_frame_parse(start,&input_char_ptr, ....) if (ERROR DETECTED) btp_log_error(start, *input_char_ptr, "Titanic hit the iceberg") -> calculates line:col by walking from start to cur = *input_char_ptr and counting \n and other chars, then log("%d:%d: %s", line, col, msg)
This way, not only struct btp_location is not needed anymore, but also a lot of "location->column += chars" statements can be dropped, along with calls to btp_location_add* and btp_location_eat*. They don't need to be replaced, they simply aren't needed at all.
I think the location code provides a good feature and it should not cause problems. It seems we have different opinion about the code simplicity/features balance.
Thank you for the hints. The code looks much better now.
Please see the attached new version of btparser and updated patches integrating it to ABRT as a subproject. The btparser should be placed in abrt/btparser to make the patches work.