docs: collect commit messages #6

Up to (and including) commit 7e53e866 'Merge branch 'feature/fragment'' Thus now all commit messages are collected.

docs: collect commit messages #6
c865a775 · Jonathan Schöbel · ce89d6ac · c865a775 · c865a775
Commit c865a775 authored 1 year ago by Jonathan Schöbel
--- a/docs/commit_messages.txt
+++ b/docs/commit_messages.txt
@@ -228,6 +228,14 @@ Data:
 	for modules, manages the database connection and maybe also
 	contains some caches. At the moment it only provides access to
 	the Validator.
+	The two predicates SH_Data_check_tag and SH_Data_check_attr are
+	wrappers to the appropriate methods of the validator. These are
+	needed, as there shouldn't be direct calls to the internal
+	structure of SH_Data.
+	The modifying methods are not exposed, as the validator
+	shouldn't be changed while others depend on it, this has to be
+	implemented later.
+	Data also contains a wrapper for the self-closing tag predicate.

 Attr:
 	The structure SH_Attr implements an HTML Attribute.
@@ -284,6 +292,10 @@ Fragment:
 	possible, as this would lead to problems e.g. double free or
 	similar quirks.

+	NodeFragment now uses the validator to validate the tags. The
+	attributes aren't validated yet, as this is more complicated,
+	because the tag is needed for that.
+
 	The single method (formerly SH_NodeFragment_append_child) to add a child
 	at the end of the child list was replaced, by a bunch of methods to
 	insert a child at the beginning (SH_NodeFragment_prepend_child), at the
@@ -357,6 +369,8 @@ Fragment:

 	A Fragment can output it's html. If there is an error the method
 	aborts and returns NULL.
+	This method also pays attention to self-closing tags, which is
+	determined via the validator.
 	When the wrap mode is used, after each tag a newline is started.
 	Also the html is indented, which can be configured by the
 	parameters indent_base, indent_step and indent_char. The
@@ -454,6 +468,149 @@ Validator:
 	72(80)-column rule. It can't be abided without severely impacting the
 	readability of the code.

+	Originally the ids were intended to be useful for linking different
+	information together internally, and for providing references
+	externally. However, they weren't used internally, for this, pointers
+	seamed to be more useful, as they also allow to directly access the data
+	and also have a relation defined.
+	Regarding reference purposes, they aren't really needed, and it is more
+	convenient to directly use some strings, and they aren't more
+	performant, as there still have to be internal checks and looking for an
+	int isn't more performant, then looking for a pointer.
+	Also, they have to be stored, so they need more memory and also some
+	code, to be handled.
+
+	While it was very clever, the complex data structure of the tag array
+	introduced in 'Validator: restructured internal data (a0c9bb2)' comes
+	with a lot of runtime overhead. It reduces the calls to free and
+	realloc, when a lot of tags are deleted and inserted subsequently, but
+	burdens each call with a loop over the linked list of free blocks.
+
+	This is even more important, as validator must be fast in checking, as
+	this is done every time something is inserted into the DOM-tree, but has
+	not so tight requirements for registering new tags, as this is merely
+	done at startup time.
+
+	As the access must be fast, the tags are sorted when inserted, so that
+	the search can take place in log-time.
+
+	There is a method to add a set of tags to a validator on initialisation.
+	First this removes a user application from the burden of maintaining the
+	html spec and also is more performant, as a lot of tags are to be
+	inserted at once, so there aren't multiple allocation calls.
+	As the validator needs the tags to be in order, the tags must be sorted
+	on insertion. Of course it would be easier for the code, if the tags
+	were already in order, but first there could be easily a mistake and
+	second sorting the tags by an algorithm allows the tags to be specified
+	in a logically grouped and those more maintainable order.
+	For the sorting, insertion sort is used. Of course it has a worse
+	quadratic time complexity, but in a constructor, I wouldn't introduce
+	the overhead of memory managment a heap- or mergesort would introduce
+	and in-place sorting is also out, because the data lies in ro-memory.
+	Thus I choose an algorithm with constant space complexity. Also the
+	'long' running time is not so important, as the initilization only runs
+	at startup once and the tags are not likely to exceed a few hundred so
+	even a quadratic time isn't that bad.
+
+	Each tag has a type as defined by the html spec. This must be provided
+	on registration. Implicitly registering tags, when an attribute is
+	registered can't be done anymore, as the type information would be
+	missing.
+	The added parameterin register_tag, as well as the change of behaviourin
+	register_attr has broken a lot of tests, that had to be adjusted
+	therefor.
+
+	Added self-closing predicate. Other predicates may follow.
+
+	The Validator contains already all HTML5 tags.
+	Tags according to:
+	https://html.spec.whatwg.org/dev/indices.html#elements-3
+
+	Types according to:
+	https://html.spec.whatwg.org/multipage/syntax.html#elements-2
+
+	Retrieved 04. 10. 2023
+
+
+	A attribute can be deregistered by calling SH_Validator_deregister_attr.
+	Note that deregistering an attr, that was never registered is considered
+	an error, but this may change, as technically it is not registered
+	afterwards and sometimes (i.e. for a blacklist) it might be preferable
+	to ensure, that a specific attr is not registered, but it is not clear
+	whether there should be an error or not.
+	Also the deallocating of the data used for an attr was moved to an extra
+	method, as this is needed in several locations and it might be subject
+	to change.
+
+	The Validator can check if a attribute is allowed in a tag. It does so
+	by associating allowed tags with attributes. This is done in that way,
+	to support also attributes which are allowed for every tag (global
+	attributes), but this is not yet supported. So some functions allow for
+	NULL to be passed and some will still crash.
+
+	The predicate SH_Validator_check_attr returns whether an attribute is
+	allowed for a specific tag. If tag is NULL, it returns whether an attr
+	is allowed at all, not whether it is allowed for every tag. For this
+	another predicate will be provided, when this is to be implemented.
+
+	The method SH_Validator_register_attr registers an tag-attr combination.
+	Note, that it will automatically call SH_Validator_register_tag, if the
+	tag doesn't exist. Later it will be possible, to set tag to NULL to
+	register a global attribute, but for now the method will crash.
+
+	The method SH_Validator_deregister_attr removes a tag-attr combination
+	registered earlier. Note, that deregistering a non existent combination
+	will result in an error. This behaviour is arguable and might be subject
+	to change. When setting only tag to NULL, all tags for this attribute
+	are deregistered. When setting only attr to NULL, all attrs for this tag
+	are deregistered. This might suffer from problems, if this involves some
+	attrs, that are global. Also this will use the internal method
+	remove_tag_for_all_attrs, which has the problem, that it might fail
+	partially. Normally when failing all functions revert the program to the
+	same state, as it was before the call. This function however is
+	different, as if it fails there might be some combinations, that haven't
+	been removed, but others are already. Nevertheless, the validator is
+	still in a valid state, so it is possible to call this function a second
+	time, but it is not sure, which combinations are already deregistered.
+
+	As the attrs also use the internal strings of the tags, it must be
+	ensured, when a tag is deregistered, that all remaining references are
+	removed, otherwise there would be dangling pointers. Note, that for this
+	also remove_tag_for_all_attrs is used, so the method
+	SH_Validator_deregister_tag suffers from the same problems listed above.
+	Also if this internal method fails, the tag won't be removed at all.
+
+	Similar to the tags, the attributes can be initialized. Missing tags are
+	automatically added. The declaration syntax is currently a bit annoying,
+	as the tags, that belong to an attribute, either have to be declared
+	explicitly or a pointer to the tag declaration must be given, but then
+	only concurrent tags are possible.
+	Support for global attributes is likewise missing; it must be ensured,
+	that (tag_n != 0) && (tags != NULL). Otherwise validator will be
+	inconsistent and there might be a bug.
+
+	Global attributes are represented by empty attributes. A global
+	attribute is an attribute, that is accepted for any tag.
+	It is refused to remove a specific tag for a global attribute, as this
+	would mean to "localize" the tag, thus making it not global anymore.
+	The method to do that and a predicate for globalness is missing yet.
+
+	Deregistering a global attribute normally is not possible, as basically
+	every other tag has to be added. This was implemented now.
+	Originally it was intended to provide the caller with the information,
+	that a global attribute has to be converted into a local one before
+	removal. However such internals should not be exposed to the caller. As
+	it stands there is no real reason to inform a caller, whether an
+	attribute is local or global. Also, there is a problem that the
+	predicate is burdened with the possibility, that the attribute doesn't
+	exists, thus it can't return a boolean directly. Both is why, the
+	predicate isn't added yet.
+	Also a bug was detected in the method remove_tag_for_all_attrs. It
+	removes an attribute while also iterating over it, thus potentially
+	skipping over some attribute and maybe also invoking undefined behaviour
+	by deallocating space after the array.
+
+
 	Copying a Validator could be useful if multiple html versions are to be
 	supported. Another use case is a blacklist XSS-Scanner.

@@ -565,9 +722,7 @@ Tests:
 	passed to another unit.
 	Because sometimes an overflow condition is checked, it is
 	necessary to include the sourcefile into the test, instead of
-	linking against the objectfile. This also allows for the
-	separate testing of static functions, as the static keyword
-	can be overridden with an empty macro.
+	linking against the objectfile.
 	Sometimes it isn't possible to check for correct overflow
 	detection by setting some number to ..._MAX, because this
 	number is used, thus a SIGSEGV would be raised. This is solved

--- a/sefht.geany
+++ b/sefht.geany
@@ -34,7 +34,7 @@ FILE_NAME_1=134;None;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2F
 FILE_NAME_2=1737;Sh;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fconfigure.ac;0;8
 FILE_NAME_3=73;Make;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2FMakefile.am;0;8
 FILE_NAME_4=19;C;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Fmain.c;0;8
-FILE_NAME_5=3555;None;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fdocs%2Fcommit_messages.txt;0;8
+FILE_NAME_5=31034;None;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fdocs%2Fcommit_messages.txt;0;8
 FILE_NAME_6=1867;Make;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Flib%2FMakefile.am;0;8
 FILE_NAME_7=18;C;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Flib%2Fsefht%2Fcms.c;0;8
 FILE_NAME_8=18;C;0;EUTF-8;1;1;0;%2Fhome%2Fjonathan%2FDokumente%2Fprojekte%2Fprgm%2Finternet%2Fweb%2FSeFHT%2Fsrc%2Flib%2Fsefht%2Fcms.h;0;8