-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/13/2010 02:56 PM, Dmitri Pal wrote:
Hi,
First news is that I will spend more time in INI validation code in the nearest future than in ELAPI as it was originally planned. So ELAPI work will be deferred. Decision is made at least for now.
Second is that I came to realization that the internal data representation for the INI collection should change. There is a bunch of data that makes sense to store together with the actual configuration value.
- The line number
- Whether the value was read from the file originally or was added on
the second pass or may be it was automatically generated because other value implies it. 3) In future it might also store some state or other additional information needed for the validation. For example: was the value successfully validated and if not, what was the error.
We do not need to define all the use cases now. But the fact that "just value" is not enough any more is important. So I think of replacing the "value" in the configuration collection with the "value object" that will be able to store mentioned above information and the "value" itself. I think it should be a structure since it is internal and can be easily internally extended on as needed basis. Same is true regarding the interfaces to deal with the object. Since it is going to be an internal object we do not have obligation to keep the interfaces the same.
Agreed, this should be converted to an opaque internal object.
So the interface will have create, destroy, and a bunch of other set and get style methods. Though it is C and not C++ I am still following a pattern of "loose coupling" and creating "facades" rather than letting anyone deal with the structure directly. Hope no objections on that front.
But as I started looking into the value object I realized that this is a perfect time to introduce or at least think about supporting multi line values in the INI files. I see two use cases that need to be handled in a different way:
- I have a long line that I want to just split between several lines
for readability. key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
In this case the splitting between different lines is just done for readability and the application would expect the value consisting of one buffer with all lines concatenated, new lines removed and NEW LINE indicators removed. In the example above the NEW LINE indicator is a back slash but we will talk about alternative indicatios in more details below.
- I have format where the new lines embedded into the value.
For example PEM format for the certificate expects one buffer that consists of the set of the concatenated lines with NEW LINES symbols at the end of those lines preserved since they are a part of the format. In the ini file it will look like this:
-----BEGIN CERTIFICATE REQUEST----- MIIBnTCCAQYCAQAwXTELMAkGA1UEBhMCU0cxETAPBgNVBAoTCE0yQ3J5cHRvMRIw EAYDVQQDEwlsb2NhbGhvc3QxJzAlBgkqhkiG9w0BCQEWGGFkbWluQHNlcnZlci5l eGFtcGxlLmRvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAr1nYY1Qrll1r uB/FqlCRrr5nvupdIN+3wF7q915tvEQoc74bnu6b8IbbGRMhzdzmvQ4SzFfVEAuM MuTHeybPq5th7YDrTNizKKxOBnqE2KYuX9X22A1Kh49soJJFg6kPb9MUgiZBiMlv tb7K3CHfgw5WagWnLl8Lb+ccvKZZl+8CAwEAAaAAMA0GCSqGSIb3DQEBBAUAA4GB AHpoRp5YS55CZpy+wdigQEwjL/wSluvo+WjtpvP0YoBMJu4VMKeZi405R7o8oEwi PdlrrliKNknFmHKIaCKTLRcU59ScA6ADEIWUzqmUzP5Cs6jrSRo3NKfg1bd09D1K 9rsQkRc9Urv9mRBIsredGnYECNeRaK5R1yzpOowninXC -----END CERTIFICATE REQUEST-----
So to address both use cases I propose that the INI interface would implement the following logic:
Read a line from the INI file into buffer Label: IF the NEW LINE indicator is present at the end of the buffer THEN IF indicator shows that the NEW LINES should be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) Next line is read and appended to the value. 3) Goto Label ELSE IF indicator shows that the NEW LINES should not be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) In place of the stripped data the new line character is inserted 3) Next line is read and appended to the value. 4) Goto Label ELSE ERROR ENDIF ELSE We are done with this value. ENDIF
Now let us talk about the NEW LINE indicator. I think of it as a sequence of characters that indicate that we have a multi-line value that either should have or should not have the new line symbol as part of the resulting concatenated string. It can be a symbol, series of symbols or a pattern.
The most logical and most convenient, as it seems to me, (and this is where I heard some resistance) would be the following patterns:
a) New line indicator that does not preserve new line is a sequence of a back slash and any spaces or tabs after it. Example (notice that there are spaces after slash):
key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
b) New line indicator that preserves new line is a sequence of a back slash and symbol 'n' and any spaces or tabs after it.
Example (notice that there are spaces after 'n' ):
key = my long multi line value with \n the preserved new lines because this \n is the format my application expects.
I heard some concerns that these patterns should not be used the way I propose since some other applications like make allow no spaces after "".
But I do not see how my approach harms? It allows those who made a mistake of putting space after slash not being punished. The spaces at the end are irrelevant so why I should punish the users of the INI interface and applications built on top of it for putting a space that has no meaning and is tripped anyways. May be I should use some other patterns instead so that someone does not confuse with the escaping?
Like:
key = my long multi line value + that I want to split in the ini file + between different lines for readability.
And
key = my long multi line value with & the preserved new lines because this & is the format my application expects.
Comments and suggestions welcome!
Thank you, Dmitri Pal
As I said yesterday, my feeling is that we should follow RFC 822 here (as the python INI parser does).
For lines that are just long and require no special charactes: name=valuevaluevalue continuationafterwhitespace
In this case, it would be read in as: {{{ valuevaluevalue continuationafterwhitespace }}}
RFC 822 requires the parser to do line continuations only at points where the resulting value can accept whitespace. This whitespace is truncated to a single space in the final value.
Now, if we want to include a value that contains newlines, it should be done as follows:
name=valuevaluevalue\ncontinueafternewline continueafterspace
which would result in the string: {{{ valuevaluevalue continueafternewline continueafterspace }}}
The parser would have to handle the following escape characters: \n -> newline \r -> carriage return \ -> literal backslash
I don't think there's any value in handling any other escapes, but others may disagree.
- -- Stephen Gallagher RHCE 804006346421761
Delivering value year after year. Red Hat ranks #1 in value among software vendors. http://www.redhat.com/promo/vendor/