From c865a775cdb9e8f49cc80f0909b3b1003c5aa15f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jonathan=20Sch=C3=B6bel?= <jonathan@xn--schbel-yxa.info> Date: Tue, 10 Oct 2023 18:20:16 +0200 Subject: [PATCH] docs: collect commit messages #6 Up to (and including) commit 7e53e866a65db633d29ccdabc4ddd63366e9cef5 'Merge branch 'feature/fragment'' Thus now all commit messages are collected. --- docs/commit_messages.txt | 161 ++++++++++++++++++++++++++++++++++++++- sefht.geany | 2 +- 2 files changed, 159 insertions(+), 4 deletions(-) diff --git a/docs/commit_messages.txt b/docs/commit_messages.txt index 178b4d0..ac42e9d 100644 --- a/docs/commit_messages.txt +++ b/docs/commit_messages.txt @@ -228,6 +228,14 @@ Data: for modules, manages the database connection and maybe also contains some caches. At the moment it only provides access to the Validator. + The two predicates SH_Data_check_tag and SH_Data_check_attr are + wrappers to the appropriate methods of the validator. These are + needed, as there shouldn't be direct calls to the internal + structure of SH_Data. + The modifying methods are not exposed, as the validator + shouldn't be changed while others depend on it, this has to be + implemented later. + Data also contains a wrapper for the self-closing tag predicate. Attr: The structure SH_Attr implements an HTML Attribute. @@ -284,6 +292,10 @@ Fragment: possible, as this would lead to problems e.g. double free or similar quirks. + NodeFragment now uses the validator to validate the tags. The + attributes aren't validated yet, as this is more complicated, + because the tag is needed for that. + The single method (formerly SH_NodeFragment_append_child) to add a child at the end of the child list was replaced, by a bunch of methods to insert a child at the beginning (SH_NodeFragment_prepend_child), at the @@ -357,6 +369,8 @@ Fragment: A Fragment can output it's html. If there is an error the method aborts and returns NULL. + This method also pays attention to self-closing tags, which is + determined via the validator. When the wrap mode is used, after each tag a newline is started. Also the html is indented, which can be configured by the parameters indent_base, indent_step and indent_char. The @@ -454,6 +468,149 @@ Validator: 72(80)-column rule. It can't be abided without severely impacting the readability of the code. + Originally the ids were intended to be useful for linking different + information together internally, and for providing references + externally. However, they weren't used internally, for this, pointers + seamed to be more useful, as they also allow to directly access the data + and also have a relation defined. + Regarding reference purposes, they aren't really needed, and it is more + convenient to directly use some strings, and they aren't more + performant, as there still have to be internal checks and looking for an + int isn't more performant, then looking for a pointer. + Also, they have to be stored, so they need more memory and also some + code, to be handled. + + While it was very clever, the complex data structure of the tag array + introduced in 'Validator: restructured internal data (a0c9bb2)' comes + with a lot of runtime overhead. It reduces the calls to free and + realloc, when a lot of tags are deleted and inserted subsequently, but + burdens each call with a loop over the linked list of free blocks. + + This is even more important, as validator must be fast in checking, as + this is done every time something is inserted into the DOM-tree, but has + not so tight requirements for registering new tags, as this is merely + done at startup time. + + As the access must be fast, the tags are sorted when inserted, so that + the search can take place in log-time. + + There is a method to add a set of tags to a validator on initialisation. + First this removes a user application from the burden of maintaining the + html spec and also is more performant, as a lot of tags are to be + inserted at once, so there aren't multiple allocation calls. + As the validator needs the tags to be in order, the tags must be sorted + on insertion. Of course it would be easier for the code, if the tags + were already in order, but first there could be easily a mistake and + second sorting the tags by an algorithm allows the tags to be specified + in a logically grouped and those more maintainable order. + For the sorting, insertion sort is used. Of course it has a worse + quadratic time complexity, but in a constructor, I wouldn't introduce + the overhead of memory managment a heap- or mergesort would introduce + and in-place sorting is also out, because the data lies in ro-memory. + Thus I choose an algorithm with constant space complexity. Also the + 'long' running time is not so important, as the initilization only runs + at startup once and the tags are not likely to exceed a few hundred so + even a quadratic time isn't that bad. + + Each tag has a type as defined by the html spec. This must be provided + on registration. Implicitly registering tags, when an attribute is + registered can't be done anymore, as the type information would be + missing. + The added parameterin register_tag, as well as the change of behaviourin + register_attr has broken a lot of tests, that had to be adjusted + therefor. + + Added self-closing predicate. Other predicates may follow. + + The Validator contains already all HTML5 tags. + Tags according to: + https://html.spec.whatwg.org/dev/indices.html#elements-3 + + Types according to: + https://html.spec.whatwg.org/multipage/syntax.html#elements-2 + + Retrieved 04. 10. 2023 + + + A attribute can be deregistered by calling SH_Validator_deregister_attr. + Note that deregistering an attr, that was never registered is considered + an error, but this may change, as technically it is not registered + afterwards and sometimes (i.e. for a blacklist) it might be preferable + to ensure, that a specific attr is not registered, but it is not clear + whether there should be an error or not. + Also the deallocating of the data used for an attr was moved to an extra + method, as this is needed in several locations and it might be subject + to change. + + The Validator can check if a attribute is allowed in a tag. It does so + by associating allowed tags with attributes. This is done in that way, + to support also attributes which are allowed for every tag (global + attributes), but this is not yet supported. So some functions allow for + NULL to be passed and some will still crash. + + The predicate SH_Validator_check_attr returns whether an attribute is + allowed for a specific tag. If tag is NULL, it returns whether an attr + is allowed at all, not whether it is allowed for every tag. For this + another predicate will be provided, when this is to be implemented. + + The method SH_Validator_register_attr registers an tag-attr combination. + Note, that it will automatically call SH_Validator_register_tag, if the + tag doesn't exist. Later it will be possible, to set tag to NULL to + register a global attribute, but for now the method will crash. + + The method SH_Validator_deregister_attr removes a tag-attr combination + registered earlier. Note, that deregistering a non existent combination + will result in an error. This behaviour is arguable and might be subject + to change. When setting only tag to NULL, all tags for this attribute + are deregistered. When setting only attr to NULL, all attrs for this tag + are deregistered. This might suffer from problems, if this involves some + attrs, that are global. Also this will use the internal method + remove_tag_for_all_attrs, which has the problem, that it might fail + partially. Normally when failing all functions revert the program to the + same state, as it was before the call. This function however is + different, as if it fails there might be some combinations, that haven't + been removed, but others are already. Nevertheless, the validator is + still in a valid state, so it is possible to call this function a second + time, but it is not sure, which combinations are already deregistered. + + As the attrs also use the internal strings of the tags, it must be + ensured, when a tag is deregistered, that all remaining references are + removed, otherwise there would be dangling pointers. Note, that for this + also remove_tag_for_all_attrs is used, so the method + SH_Validator_deregister_tag suffers from the same problems listed above. + Also if this internal method fails, the tag won't be removed at all. + + Similar to the tags, the attributes can be initialized. Missing tags are + automatically added. The declaration syntax is currently a bit annoying, + as the tags, that belong to an attribute, either have to be declared + explicitly or a pointer to the tag declaration must be given, but then + only concurrent tags are possible. + Support for global attributes is likewise missing; it must be ensured, + that (tag_n != 0) && (tags != NULL). Otherwise validator will be + inconsistent and there might be a bug. + + Global attributes are represented by empty attributes. A global + attribute is an attribute, that is accepted for any tag. + It is refused to remove a specific tag for a global attribute, as this + would mean to "localize" the tag, thus making it not global anymore. + The method to do that and a predicate for globalness is missing yet. + + Deregistering a global attribute normally is not possible, as basically + every other tag has to be added. This was implemented now. + Originally it was intended to provide the caller with the information, + that a global attribute has to be converted into a local one before + removal. However such internals should not be exposed to the caller. As + it stands there is no real reason to inform a caller, whether an + attribute is local or global. Also, there is a problem that the + predicate is burdened with the possibility, that the attribute doesn't + exists, thus it can't return a boolean directly. Both is why, the + predicate isn't added yet. + Also a bug was detected in the method remove_tag_for_all_attrs. It + removes an attribute while also iterating over it, thus potentially + skipping over some attribute and maybe also invoking undefined behaviour + by deallocating space after the array. + + Copying a Validator could be useful if multiple html versions are to be supported. Another use case is a blacklist XSS-Scanner. @@ -565,9 +722,7 @@ Tests: passed to another unit. Because sometimes an overflow condition is checked, it is necessary to include the sourcefile into the test, instead of - linking against the objectfile. This also allows for the - separate testing of static functions, as the static keyword - can be overridden with an empty macro. + linking against the objectfile. Sometimes it isn't possible to check for correct overflow detection by setting some number to ..._MAX, because this number is used, thus a SIGSEGV would be raised. This is solved diff --git a/sefht.geany b/sefht.geany index 8a2303e..6f6c4df 100644 --- a/sefht.geany +++ b/sefht.geany @@ -34,7 +34,7 @@ FILE_NAME_1=134;None;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2F FILE_NAME_2=1737;Sh;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fconfigure.ac;0;8 FILE_NAME_3=73;Make;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2FMakefile.am;0;8 FILE_NAME_4=19;C;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Fmain.c;0;8 -FILE_NAME_5=3555;None;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fdocs%2Fcommit_messages.txt;0;8 +FILE_NAME_5=31034;None;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fdocs%2Fcommit_messages.txt;0;8 FILE_NAME_6=1867;Make;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Flib%2FMakefile.am;0;8 FILE_NAME_7=18;C;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Flib%2Fsefht%2Fcms.c;0;8 FILE_NAME_8=18;C;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Flib%2Fsefht%2Fcms.h;0;8 -- GitLab