site-josuah

/usr/josuah
Log | Files | Refs

commit 42e99fd5049d0a6a153b2fda1f6ffdb85bb8b077
parent faae0c480b166f40ad81995dd7bcf81c4aa1d167
Author: Josuah Demangeon <me@josuah.net>
Date:   Fri,  7 Aug 2020 00:42:50 +0200

add tips about compund litterals

Diffstat:
Dwiki/c-coding-style/index.md | 353-------------------------------------------------------------------------------
Awiki/c-programming/index.md | 394+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 394 insertions(+), 353 deletions(-)

diff --git a/wiki/c-coding-style/index.md b/wiki/c-coding-style/index.md @@ -1,353 +0,0 @@ -Bits of C programming style -=========================== -that helps to isolate unrelated parts of your programs. - -Naming convention and project hierarchy ---------------------------------------- -* ./Makefile (and config.mk, content.mk, Makefile.linux, Makefile.bsd...) -* ./README, ./README.md -* ./progname.c -* ./doc/rfc3564.txt, ./doc/progname.1 -* ./src/base64.c, ./src/random.c, ./src/conf.c - -And then, following the file name in ./src/: - -* base64_decode(), base64_encode(), ... -* random_bytes(), random_u64(), random_u32(), random_mem(), ... -* conf_read_file(), conf_parse(), conf_get(), conf_dump(), ... - -assert(), assert(), assert() ----------------------------- -* a little bit closer to memory safe languages in some cases. -* no performance overhead in production. -* assert or unit tests: program internals. -* assert or unit tests: fuzzing. -* assert and unit tests: larger libraries, code bases -* for libraries: helper program - -Error handling --------------- -While other programming languages planned special keywords like try/catch and -features like exceptions, automatic cleanup at function exit (useful while -needing to return early, like when an error occurs), C lets the user handle -error with `if (failure_condition) { handle_failure; }`. - -Some idioms permit error handling to be as inobstrusive in C as in any other -languages: - -### Error handling if() style - -``` - mem = malloc(3); - if (mem == NULL) - return NULL; --vs- - if ((mem = malloc(3)) == NULL) - return NULL; -``` - -### Error handling with goto - -``` - fd = open(path, O_RDONLY); - if (fd < 0) - return -1; - - mem = malloc(3); - if (mem == NULL) { - close(fd); - return -1; - } - - if (do_something_1() < 0) { - close(fd); - free(mem); - return -1; - } - - if (do_something_2() < 0) { - close(fd); - free(mem); - return -1; - } - - return 0; - --vs- - fd = open(path, O_RDONLY); - if (fd < 0) - return -1; - - mem = malloc(3); - if (mem == NULL) - goto err_close; - - if (do_something_1() < 0) - goto err_close_free; - - if (do_something_2() < 0) { - goto err_close_free; - - return 0; -err_close: - close(fd); -err_close_free: - free(mem); - return -1; - --vs- - fd = -1; - mem = NULL; - - fd = open(path, O_RDONLY); - if (fd < 0) - goto err; - - mem = malloc(3); - if (mem == NULL) - goto err; - - if (do_something_1() < 0) - goto err; - - if (do_something_2() < 0) { - goto err; - - return 0; -err: - close(fd); /* does nothing if fd == -1 */ - free(mem); /* does nothing if mem == NULL */ - return -1; -``` - -Known as "the only right use of goto". - -### Error handling with enums + switch + *_errstr - -* Because printing error from within a library messes with the program I/O - (considered a bad practice everywhere) -* Because 0 for success and -1 for error (or 1 for success and 0 for error) - does not tells us much... -* try { ... } catch (errortype) { ... } in plain C for free! - -``` - int - conf_read_file(char const *path, struct conf *cf) - { - int fd, err = 0; - - fd = open(path, O_RDONLY); - if (fd < 0) - return -CONF_ERR_SYSTEM; - - err = conf_parse(fd, cf); - if (err < 0) - goto end; - end: - close(fd); - return err; - } -... - int - conf_errstr(int i) - { - enum conf_errno err = (i > 0) ? i : -i; - - switch (err) { - case CONF_ERR_SYSTEM: - return "system error"; - case CONF_ERR_SYNTAX: - return "syntax error"; - } - assert(!"all errno should have been handled before"); - return "unknown error"; /* make compiler happy */ - } -... - int - main(int argc, char **argv) - { - struct conf cf = {0}; - int err; - - err = conf_get(path, &conf); - switch (-err) { - case 0: - break; - case -CONF_ERR_SYSTEM: - fprintf(stderr, "%s: %s: %s\n", - argv[0], conf_strerror(err), strerror(errno)); - return -1; - default: - fprintf(stderr, "%s: %s\n", - argv[0], conf_strerror(err)); - return -1; - } - - return 0; - } -``` - -(Ab)use of types ----------------- -While abused, types makes the code more opaque and harder to read. But used -wisely, they are the key for handling structured data and build useful -abstractions, which often come for no extra cost in compiled languages that -work without a runtime type system (but a compile-time one). - -### struct is the key - -What if more parameter to give? Change all the functions? - -``` - do_something(buf1, len1, prop1, buf2, len2, prop2, buf3, len3, prop3); --vs- - do_something(struct1, struct2, struct3); -``` - -Struct gives an outline of the data structures that the program uses: - -> Choose the right data structures and the program will write itself - -### enum is the sight - -In switch statements and in code: - -``` - switch (get_state()) { - case 2: - do_it_again(); - break; - case 1: - next_step(); - break; - } --vs- - switch (get_state()) { - case PARTIAL_DATA: - do_it_again(); - break; - case DONE: - next_step(); - break; - } - /* compiler warning for missing TOO_MUCH_DATA */ -``` - -Initialize fields: - -``` - char *state_to_description[] = { - NULL, - "all done, goodbye", - "partial data read, doing it again", - NULL, - NULL, - "too much input read, erroring out", - }; --vs- - char *state_to_description[] = { - [DONE] = "all done, goodbye", - [PARTIAL_DATA] = "partial data read, doing it again", - [TOO_MUCH_DATA] = "too much input read, erroring out", - }; -``` - -Memory management ------------------ -C memory management plays around pointers: a variable holding a position in -memory that is available for the program to use. - -For growing a region of available memory (a buffer), it is sometimes needed -to move memory somewhere else, which changes the pointer: the address in memory -locating the buffer. - -Pointers are useful for multiple data structures, for referring another element -such as the `->next` element of a linked list. - -If a buffer is refered from by one of these data structures, but the buffer -grows, the reference will therefore be invalid. - -Combining memory management and pointers gets done through checking when a -buffer is immutable (and will not change anymore), or may still be modified: - -### Same struct type, various size: - -``` - struct obj { - char *name, *description; . length given by - size_t len; |- sizeof(struct obj) - struct obj *next; ' - char buf[]; :- variable length at - }; the end of the struct - - struct obj * - obj_parse_new(char const *input, struct obj **first) - { - struct obj *new; - size_t len; - - len = strlen(input); - new = calloc(1, sizeof *new + len); - if (new == NULL) - return NULL; - - memcpy(new->buf, input, len); - new->len = len; - - if (obj_parse(new) < 0) - goto err; - - new->next = *first; - *first = new; - return new; - err: - free(new); - return NULL; - } -``` - -### Immutable pointers and linked lists - -Not an immutable pointer: it changes the pointer as the memory grows: - -``` - struct obj * - obj_grow(struct obj *obj) - { - mem = realloc(obj, obj->len + 10) - if (mem == NULL) - return NULL; - obj = mem; - obj->len += 10; - } -``` - -This is what we want for linked lists: it does not change the pointer as the -memory grows. - -``` - int - obj_grow(struct obj *obj) - { - mem = realloc(obj->buf, obj->len + 10) - if (mem == NULL) - return -1; - obj->buf = mem; - obj->len += 10; - } -``` - -Variable-size struct member as shown before: only when size does not change in -advance. - -Safe wrapper for string and memory management ---------------------------------------------- -C is a very unsafe programming language when used badly. It is still unsafe while -used nicely, but slightly less. It is even less unsafe with simple wrapper over -the buffer and memory management primitives. - -Calculating the length and then filling the memory is dangerous: length -miscalculation happen often, and this is a very frequent operation (for growing -strins and buffers). - -These two operation needs to be coupled. diff --git a/wiki/c-programming/index.md b/wiki/c-programming/index.md @@ -0,0 +1,394 @@ +Bits of C programming +===================== +Coding style that helps to isolate unrelated parts of your programs, strive to +give a sane and simple API. + +Naming convention and project hierarchy +--------------------------------------- +* ./Makefile (and config.mk, content.mk, Makefile.linux, Makefile.bsd...) +* ./README, ./README.md +* ./progname.c +* ./doc/rfc3564.txt, ./doc/progname.1 +* ./src/base64.c, ./src/random.c, ./src/conf.c + +And then, following the file name in ./src/: + +* base64_decode(), base64_encode(), ... +* random_bytes(), random_u64(), random_u32(), random_mem(), ... +* conf_read_file(), conf_parse(), conf_get(), conf_dump(), ... + +assert(), assert(), assert() +---------------------------- +* a little bit closer to memory safe languages in some cases. +* no performance overhead in production. +* assert or unit tests: program internals. +* assert or unit tests: fuzzing. +* assert and unit tests: larger libraries, code bases +* for libraries: helper program + +Error handling +-------------- +While other programming languages planned special keywords like try/catch and +features like exceptions, automatic cleanup at function exit (useful while +needing to return early, like when an error occurs), C lets the user handle +error with `if (failure_condition) { handle_failure; }`. + +Some idioms permit error handling to be as inobstrusive in C as in any other +languages: + +### Error handling if() style + +``` + mem = malloc(3); + if (mem == NULL) + return NULL; +-vs- + if ((mem = malloc(3)) == NULL) + return NULL; +``` + +### Error handling with goto + +``` + fd = open(path, O_RDONLY); + if (fd < 0) + return -1; + + mem = malloc(3); + if (mem == NULL) { + close(fd); + return -1; + } + + if (do_something_1() < 0) { + close(fd); + free(mem); + return -1; + } + + if (do_something_2() < 0) { + close(fd); + free(mem); + return -1; + } + + return 0; + +-vs- + fd = open(path, O_RDONLY); + if (fd < 0) + return -1; + + mem = malloc(3); + if (mem == NULL) + goto err_close; + + if (do_something_1() < 0) + goto err_close_free; + + if (do_something_2() < 0) { + goto err_close_free; + + return 0; +err_close: + close(fd); +err_close_free: + free(mem); + return -1; + +-vs- + fd = -1; + mem = NULL; + + fd = open(path, O_RDONLY); + if (fd < 0) + goto err; + + mem = malloc(3); + if (mem == NULL) + goto err; + + if (do_something_1() < 0) + goto err; + + if (do_something_2() < 0) { + goto err; + + return 0; +err: + close(fd); /* does nothing if fd == -1 */ + free(mem); /* does nothing if mem == NULL */ + return -1; +``` + +Known as "the only right use of goto". + +### Error handling with enums + switch + *_errstr + +* Because printing error from within a library messes with the program I/O + (considered a bad practice everywhere) +* Because 0 for success and -1 for error (or 1 for success and 0 for error) + does not tells us much... +* try { ... } catch (errortype) { ... } in plain C for free! + +``` + int + conf_read_file(char const *path, struct conf *cf) + { + int fd, err = 0; + + fd = open(path, O_RDONLY); + if (fd < 0) + return -CONF_ERR_SYSTEM; + + err = conf_parse(fd, cf); + if (err < 0) + goto end; + end: + close(fd); + return err; + } +... + int + conf_errstr(int i) + { + enum conf_errno err = (i > 0) ? i : -i; + + switch (err) { + case CONF_ERR_SYSTEM: + return "system error"; + case CONF_ERR_SYNTAX: + return "syntax error"; + } + assert(!"all errno should have been handled before"); + return "unknown error"; /* make compiler happy */ + } +... + int + main(int argc, char **argv) + { + struct conf cf = {0}; + int err; + + err = conf_get(path, &conf); + switch (-err) { + case 0: + break; + case -CONF_ERR_SYSTEM: + fprintf(stderr, "%s: %s: %s\n", + argv[0], conf_strerror(err), strerror(errno)); + return -1; + default: + fprintf(stderr, "%s: %s\n", + argv[0], conf_strerror(err)); + return -1; + } + + return 0; + } +``` + +(Ab)use of types +---------------- +While abused, types makes the code more opaque and harder to read. But used +wisely, they are the key for handling structured data and build useful +abstractions, which often come for no extra cost in compiled languages that +work without a runtime type system (but a compile-time one). + +### struct is the key + +What if more parameter to give? Change all the functions? + +``` + do_something(buf1, len1, prop1, buf2, len2, prop2, buf3, len3, prop3); +-vs- + do_something(struct1, struct2, struct3); +``` + +Struct gives an outline of the data structures that the program uses: + +> Choose the right data structures and the program will write itself + +### enum is the sight + +In switch statements and in code: + +``` + switch (get_state()) { + case 2: + do_it_again(); + break; + case 1: + next_step(); + break; + } +-vs- + switch (get_state()) { + case PARTIAL_DATA: + do_it_again(); + break; + case DONE: + next_step(); + break; + } + /* compiler warning for missing TOO_MUCH_DATA */ +``` + +Initialize fields: + +``` + char *state_to_description[] = { + NULL, + "all done, goodbye", + "partial data read, doing it again", + NULL, + NULL, + "too much input read, erroring out", + }; +-vs- + char *state_to_description[] = { + [DONE] = "all done, goodbye", + [PARTIAL_DATA] = "partial data read, doing it again", + [TOO_MUCH_DATA] = "too much input read, erroring out", + }; +``` + +Memory management +----------------- +C memory management plays around pointers: a variable holding a position in +memory that is available for the program to use. + +For growing a region of available memory (a buffer), it is sometimes needed +to move memory somewhere else, which changes the pointer: the address in memory +locating the buffer. + +Pointers are useful for multiple data structures, for referring another element +such as the `->next` element of a linked list. + +If a buffer is refered from by one of these data structures, but the buffer +grows, the reference will therefore be invalid. + +Combining memory management and pointers gets done through checking when a +buffer is immutable (and will not change anymore), or may still be modified: + +### Same struct type, various size: + +``` + struct obj { + char *name, *description; . length given by + size_t len; |- sizeof(struct obj) + struct obj *next; ' + char buf[]; :- variable length at + }; the end of the struct + + struct obj * + obj_parse_new(char const *input, struct obj **first) + { + struct obj *new; + size_t len; + + len = strlen(input); + new = calloc(1, sizeof *new + len); + if (new == NULL) + return NULL; + + memcpy(new->buf, input, len); + new->len = len; + + if (obj_parse(new) < 0) + goto err; + + new->next = *first; + *first = new; + return new; + err: + free(new); + return NULL; + } +``` + +### Immutable pointers and linked lists + +Not an immutable pointer: it changes the pointer as the memory grows: + +``` + struct obj * + obj_grow(struct obj *obj) + { + mem = realloc(obj, obj->len + 10) + if (mem == NULL) + return NULL; + obj = mem; + obj->len += 10; + } +``` + +This is what we want for linked lists: it does not change the pointer as the +memory grows. + +``` + int + obj_grow(struct obj *obj) + { + mem = realloc(obj->buf, obj->len + 10) + if (mem == NULL) + return -1; + obj->buf = mem; + obj->len += 10; + } +``` + +Variable-size struct member as shown before: only when size does not change in +advance. + +Safe wrapper for string and memory management +--------------------------------------------- +C is a very unsafe programming language when used badly. It is still unsafe while +used nicely, but slightly less. It is even less unsafe with simple wrapper over +the buffer and memory management primitives. + +Calculating the length and then filling the memory is dangerous: length +miscalculation happen often, and this is a very frequent operation (for growing +strins and buffers). + +These two operation needs to be coupled. + +Reentrant function without specifying the buffer +------------------------------------------------ +Reentrancy is a property of a function which ensure that even if it is being +executed multiple times at once (due to threads), it will still work. + +One cause for fuction to fail at this is when they use a global (or static) +variables. + +One temptation for using global variable that happen so often that even the +libc is built around it is to save the developer from passing a buffer to +fill. + +For example localtime() is returning a pointer to a (struct tm *), but +no such structure is passed to the function: a same global or static +variable is used for all calls to localtime, which makes it vulnerable +while using threads. localtime_r() is to be used instead. + +It is possible to pass a chunk of the stack memory of the calling function +without having to declare a variable through compound literal and a macro: + +``` + strftime("%Y-%m-%d", localtime_r(clock, (structd tm *){0})); +``` + +This is implemented inline, but if `localtime()` was a define to this +statement, then `localtime()` would have been reentrant with the same +convenience it currently has: + +``` + #define localtime(clock) localtime_r(clock, (structd tm *){0})) +``` + +This can also be convenient to write formatters that take structures +or integers as input and return a string that can be passed directly +to printf or similar: + +``` + printf("...%s...", fmt(num)) +```