Hi,
First news is that I will spend more time in INI validation code in the nearest future than in ELAPI as it was originally planned. So ELAPI work will be deferred. Decision is made at least for now.
Second is that I came to realization that the internal data representation for the INI collection should change. There is a bunch of data that makes sense to store together with the actual configuration value. 1) The line number 2) Whether the value was read from the file originally or was added on the second pass or may be it was automatically generated because other value implies it. 3) In future it might also store some state or other additional information needed for the validation. For example: was the value successfully validated and if not, what was the error.
We do not need to define all the use cases now. But the fact that "just value" is not enough any more is important. So I think of replacing the "value" in the configuration collection with the "value object" that will be able to store mentioned above information and the "value" itself. I think it should be a structure since it is internal and can be easily internally extended on as needed basis. Same is true regarding the interfaces to deal with the object. Since it is going to be an internal object we do not have obligation to keep the interfaces the same.
So the interface will have create, destroy, and a bunch of other set and get style methods. Though it is C and not C++ I am still following a pattern of "loose coupling" and creating "facades" rather than letting anyone deal with the structure directly. Hope no objections on that front.
But as I started looking into the value object I realized that this is a perfect time to introduce or at least think about supporting multi line values in the INI files. I see two use cases that need to be handled in a different way: 1) I have a long line that I want to just split between several lines for readability. key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
In this case the splitting between different lines is just done for readability and the application would expect the value consisting of one buffer with all lines concatenated, new lines removed and NEW LINE indicators removed. In the example above the NEW LINE indicator is a back slash but we will talk about alternative indicatios in more details below.
2) I have format where the new lines embedded into the value. For example PEM format for the certificate expects one buffer that consists of the set of the concatenated lines with NEW LINES symbols at the end of those lines preserved since they are a part of the format. In the ini file it will look like this:
-----BEGIN CERTIFICATE REQUEST----- MIIBnTCCAQYCAQAwXTELMAkGA1UEBhMCU0cxETAPBgNVBAoTCE0yQ3J5cHRvMRIw EAYDVQQDEwlsb2NhbGhvc3QxJzAlBgkqhkiG9w0BCQEWGGFkbWluQHNlcnZlci5l eGFtcGxlLmRvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAr1nYY1Qrll1r uB/FqlCRrr5nvupdIN+3wF7q915tvEQoc74bnu6b8IbbGRMhzdzmvQ4SzFfVEAuM MuTHeybPq5th7YDrTNizKKxOBnqE2KYuX9X22A1Kh49soJJFg6kPb9MUgiZBiMlv tb7K3CHfgw5WagWnLl8Lb+ccvKZZl+8CAwEAAaAAMA0GCSqGSIb3DQEBBAUAA4GB AHpoRp5YS55CZpy+wdigQEwjL/wSluvo+WjtpvP0YoBMJu4VMKeZi405R7o8oEwi PdlrrliKNknFmHKIaCKTLRcU59ScA6ADEIWUzqmUzP5Cs6jrSRo3NKfg1bd09D1K 9rsQkRc9Urv9mRBIsredGnYECNeRaK5R1yzpOowninXC -----END CERTIFICATE REQUEST-----
So to address both use cases I propose that the INI interface would implement the following logic:
Read a line from the INI file into buffer Label: IF the NEW LINE indicator is present at the end of the buffer THEN IF indicator shows that the NEW LINES should be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) Next line is read and appended to the value. 3) Goto Label ELSE IF indicator shows that the NEW LINES should not be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) In place of the stripped data the new line character is inserted 3) Next line is read and appended to the value. 4) Goto Label ELSE ERROR ENDIF ELSE We are done with this value. ENDIF
Now let us talk about the NEW LINE indicator. I think of it as a sequence of characters that indicate that we have a multi-line value that either should have or should not have the new line symbol as part of the resulting concatenated string. It can be a symbol, series of symbols or a pattern.
The most logical and most convenient, as it seems to me, (and this is where I heard some resistance) would be the following patterns:
a) New line indicator that does not preserve new line is a sequence of a back slash and any spaces or tabs after it. Example (notice that there are spaces after slash):
key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
b) New line indicator that preserves new line is a sequence of a back slash and symbol 'n' and any spaces or tabs after it.
Example (notice that there are spaces after 'n' ):
key = my long multi line value with \n the preserved new lines because this \n is the format my application expects.
I heard some concerns that these patterns should not be used the way I propose since some other applications like make allow no spaces after "".
But I do not see how my approach harms? It allows those who made a mistake of putting space after slash not being punished. The spaces at the end are irrelevant so why I should punish the users of the INI interface and applications built on top of it for putting a space that has no meaning and is tripped anyways. May be I should use some other patterns instead so that someone does not confuse with the escaping?
Like:
key = my long multi line value + that I want to split in the ini file + between different lines for readability.
And
key = my long multi line value with & the preserved new lines because this & is the format my application expects.
Comments and suggestions welcome!
Thank you, Dmitri Pal
Engineering Manager IPA project, Red Hat Inc.
------------------------------- Looking to carve out IT costs? www.redhat.com/carveoutcosts/
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/13/2010 02:56 PM, Dmitri Pal wrote:
Hi,
First news is that I will spend more time in INI validation code in the nearest future than in ELAPI as it was originally planned. So ELAPI work will be deferred. Decision is made at least for now.
Second is that I came to realization that the internal data representation for the INI collection should change. There is a bunch of data that makes sense to store together with the actual configuration value.
- The line number
- Whether the value was read from the file originally or was added on
the second pass or may be it was automatically generated because other value implies it. 3) In future it might also store some state or other additional information needed for the validation. For example: was the value successfully validated and if not, what was the error.
We do not need to define all the use cases now. But the fact that "just value" is not enough any more is important. So I think of replacing the "value" in the configuration collection with the "value object" that will be able to store mentioned above information and the "value" itself. I think it should be a structure since it is internal and can be easily internally extended on as needed basis. Same is true regarding the interfaces to deal with the object. Since it is going to be an internal object we do not have obligation to keep the interfaces the same.
Agreed, this should be converted to an opaque internal object.
So the interface will have create, destroy, and a bunch of other set and get style methods. Though it is C and not C++ I am still following a pattern of "loose coupling" and creating "facades" rather than letting anyone deal with the structure directly. Hope no objections on that front.
But as I started looking into the value object I realized that this is a perfect time to introduce or at least think about supporting multi line values in the INI files. I see two use cases that need to be handled in a different way:
- I have a long line that I want to just split between several lines
for readability. key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
In this case the splitting between different lines is just done for readability and the application would expect the value consisting of one buffer with all lines concatenated, new lines removed and NEW LINE indicators removed. In the example above the NEW LINE indicator is a back slash but we will talk about alternative indicatios in more details below.
- I have format where the new lines embedded into the value.
For example PEM format for the certificate expects one buffer that consists of the set of the concatenated lines with NEW LINES symbols at the end of those lines preserved since they are a part of the format. In the ini file it will look like this:
-----BEGIN CERTIFICATE REQUEST----- MIIBnTCCAQYCAQAwXTELMAkGA1UEBhMCU0cxETAPBgNVBAoTCE0yQ3J5cHRvMRIw EAYDVQQDEwlsb2NhbGhvc3QxJzAlBgkqhkiG9w0BCQEWGGFkbWluQHNlcnZlci5l eGFtcGxlLmRvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAr1nYY1Qrll1r uB/FqlCRrr5nvupdIN+3wF7q915tvEQoc74bnu6b8IbbGRMhzdzmvQ4SzFfVEAuM MuTHeybPq5th7YDrTNizKKxOBnqE2KYuX9X22A1Kh49soJJFg6kPb9MUgiZBiMlv tb7K3CHfgw5WagWnLl8Lb+ccvKZZl+8CAwEAAaAAMA0GCSqGSIb3DQEBBAUAA4GB AHpoRp5YS55CZpy+wdigQEwjL/wSluvo+WjtpvP0YoBMJu4VMKeZi405R7o8oEwi PdlrrliKNknFmHKIaCKTLRcU59ScA6ADEIWUzqmUzP5Cs6jrSRo3NKfg1bd09D1K 9rsQkRc9Urv9mRBIsredGnYECNeRaK5R1yzpOowninXC -----END CERTIFICATE REQUEST-----
So to address both use cases I propose that the INI interface would implement the following logic:
Read a line from the INI file into buffer Label: IF the NEW LINE indicator is present at the end of the buffer THEN IF indicator shows that the NEW LINES should be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) Next line is read and appended to the value. 3) Goto Label ELSE IF indicator shows that the NEW LINES should not be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) In place of the stripped data the new line character is inserted 3) Next line is read and appended to the value. 4) Goto Label ELSE ERROR ENDIF ELSE We are done with this value. ENDIF
Now let us talk about the NEW LINE indicator. I think of it as a sequence of characters that indicate that we have a multi-line value that either should have or should not have the new line symbol as part of the resulting concatenated string. It can be a symbol, series of symbols or a pattern.
The most logical and most convenient, as it seems to me, (and this is where I heard some resistance) would be the following patterns:
a) New line indicator that does not preserve new line is a sequence of a back slash and any spaces or tabs after it. Example (notice that there are spaces after slash):
key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
b) New line indicator that preserves new line is a sequence of a back slash and symbol 'n' and any spaces or tabs after it.
Example (notice that there are spaces after 'n' ):
key = my long multi line value with \n the preserved new lines because this \n is the format my application expects.
I heard some concerns that these patterns should not be used the way I propose since some other applications like make allow no spaces after "".
But I do not see how my approach harms? It allows those who made a mistake of putting space after slash not being punished. The spaces at the end are irrelevant so why I should punish the users of the INI interface and applications built on top of it for putting a space that has no meaning and is tripped anyways. May be I should use some other patterns instead so that someone does not confuse with the escaping?
Like:
key = my long multi line value + that I want to split in the ini file + between different lines for readability.
And
key = my long multi line value with & the preserved new lines because this & is the format my application expects.
Comments and suggestions welcome!
Thank you, Dmitri Pal
As I said yesterday, my feeling is that we should follow RFC 822 here (as the python INI parser does).
For lines that are just long and require no special charactes: name=valuevaluevalue continuationafterwhitespace
In this case, it would be read in as: {{{ valuevaluevalue continuationafterwhitespace }}}
RFC 822 requires the parser to do line continuations only at points where the resulting value can accept whitespace. This whitespace is truncated to a single space in the final value.
Now, if we want to include a value that contains newlines, it should be done as follows:
name=valuevaluevalue\ncontinueafternewline continueafterspace
which would result in the string: {{{ valuevaluevalue continueafternewline continueafterspace }}}
The parser would have to handle the following escape characters: \n -> newline \r -> carriage return \ -> literal backslash
I don't think there's any value in handling any other escapes, but others may disagree.
- -- Stephen Gallagher RHCE 804006346421761
Delivering value year after year. Red Hat ranks #1 in value among software vendors. http://www.redhat.com/promo/vendor/
Stephen Gallagher wrote:
On 04/13/2010 02:56 PM, Dmitri Pal wrote:
Hi,
First news is that I will spend more time in INI validation code in the nearest future than in ELAPI as it was originally planned. So ELAPI work will be deferred. Decision is made at least for now.
Second is that I came to realization that the internal data representation for the INI collection should change. There is a bunch of data that makes sense to store together with the actual configuration value.
- The line number
- Whether the value was read from the file originally or was added on
the second pass or may be it was automatically generated because other value implies it. 3) In future it might also store some state or other additional information needed for the validation. For example: was the value successfully validated and if not, what was the error.
We do not need to define all the use cases now. But the fact that "just value" is not enough any more is important. So I think of replacing the "value" in the configuration collection with the "value object" that will be able to store mentioned above information and the "value" itself. I think it should be a structure since it is internal and can be easily internally extended on as needed basis. Same is true regarding the interfaces to deal with the object. Since it is going to be an internal object we do not have obligation to keep the interfaces the same.
Agreed, this should be converted to an opaque internal object.
So the interface will have create, destroy, and a bunch of other set and get style methods. Though it is C and not C++ I am still following a pattern of "loose coupling" and creating "facades" rather than letting anyone deal with the structure directly. Hope no objections on that
front.
But as I started looking into the value object I realized that this is a perfect time to introduce or at least think about supporting multi line values in the INI files. I see two use cases that need to be handled in a different way:
- I have a long line that I want to just split between several lines
for readability. key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
In this case the splitting between different lines is just done for readability and the application would expect the value consisting of one buffer with all lines concatenated, new lines removed and NEW LINE indicators removed. In the example above the NEW LINE indicator is a back slash but we will talk about alternative indicatios in more details below.
- I have format where the new lines embedded into the value.
For example PEM format for the certificate expects one buffer that consists of the set of the concatenated lines with NEW LINES symbols at the end of those lines preserved since they are a part of the format. In the ini file it will look like this:
-----BEGIN CERTIFICATE REQUEST----- MIIBnTCCAQYCAQAwXTELMAkGA1UEBhMCU0cxETAPBgNVBAoTCE0yQ3J5cHRvMRIw EAYDVQQDEwlsb2NhbGhvc3QxJzAlBgkqhkiG9w0BCQEWGGFkbWluQHNlcnZlci5l eGFtcGxlLmRvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAr1nYY1Qrll1r uB/FqlCRrr5nvupdIN+3wF7q915tvEQoc74bnu6b8IbbGRMhzdzmvQ4SzFfVEAuM MuTHeybPq5th7YDrTNizKKxOBnqE2KYuX9X22A1Kh49soJJFg6kPb9MUgiZBiMlv tb7K3CHfgw5WagWnLl8Lb+ccvKZZl+8CAwEAAaAAMA0GCSqGSIb3DQEBBAUAA4GB AHpoRp5YS55CZpy+wdigQEwjL/wSluvo+WjtpvP0YoBMJu4VMKeZi405R7o8oEwi PdlrrliKNknFmHKIaCKTLRcU59ScA6ADEIWUzqmUzP5Cs6jrSRo3NKfg1bd09D1K 9rsQkRc9Urv9mRBIsredGnYECNeRaK5R1yzpOowninXC -----END CERTIFICATE REQUEST-----
So to address both use cases I propose that the INI interface would implement the following logic:
Read a line from the INI file into buffer Label: IF the NEW LINE indicator is present at the end of the buffer THEN IF indicator shows that the NEW LINES should be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) Next line is read and appended to the value. 3) Goto Label ELSE IF indicator shows that the NEW LINES should not be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) In place of the stripped data the new line character is inserted 3) Next line is read and appended to the value. 4) Goto Label ELSE ERROR ENDIF ELSE We are done with this value. ENDIF
Now let us talk about the NEW LINE indicator. I think of it as a sequence of characters that indicate that we have a multi-line value that either should have or should not have the new line symbol as part of the resulting concatenated string. It can be a symbol, series of symbols or a pattern.
The most logical and most convenient, as it seems to me, (and this is where I heard some resistance) would be the following patterns:
a) New line indicator that does not preserve new line is a sequence of a back slash and any spaces or tabs after it. Example (notice that there are spaces after slash):
key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
b) New line indicator that preserves new line is a sequence of a back slash and symbol 'n' and any spaces or tabs after it.
Example (notice that there are spaces after 'n' ):
key = my long multi line value with \n the preserved new lines because this \n is the format my application expects.
I heard some concerns that these patterns should not be used the way I propose since some other applications like make allow no spaces after "".
But I do not see how my approach harms? It allows those who made a mistake of putting space after slash not being punished. The spaces at the end are irrelevant so why I should punish the users of the INI interface and applications built on top of it for putting a space that has no meaning and is tripped anyways. May be I should use some other patterns instead so that someone does not confuse with the escaping?
Like:
key = my long multi line value + that I want to split in the ini file + between different lines for readability.
And
key = my long multi line value with & the preserved new lines because this & is the format my application expects.
Comments and suggestions welcome!
Thank you, Dmitri Pal
As I said yesterday, my feeling is that we should follow RFC 822 here (as the python INI parser does).
For lines that are just long and require no special charactes: name=valuevaluevalue continuationafterwhitespace
In this case, it would be read in as: {{{ valuevaluevalue continuationafterwhitespace }}}
RFC 822 requires the parser to do line continuations only at points where the resulting value can accept whitespace. This whitespace is truncated to a single space in the final value.
Frankly I do not like this format. I think it is too limiting. It was used for a special case and now many implementations copied it. But I really do not see a compelling reason to do it the same way. What if you have a long base 64 encoded string that you want to use? What spaces you are talking about? IMO it is just the wrong model to follow in the first place. Also as far as I understood Nalin for the example I gave here the cert format includes the new line characters in the output buffer. This is not something that 822 specifies but this is what PEM expects. PEM expects no spaces but new lines while as you mentioned 822 talks about single whitespace but not new line.
Now, if we want to include a value that contains newlines, it should be done as follows:
name=valuevaluevalue\ncontinueafternewline continueafterspace
which would result in the string: {{{ valuevaluevalue continueafternewline continueafterspace }}}
The parser would have to handle the following escape characters: \n -> newline \r -> carriage return \ -> literal backslash
I don't think there's any value in handling any other escapes, but others may disagree.
This is totally wrong IMO. You are trying to introduce escape sequences and I do not want to do this. Too much for nothing. The place where you cut the value is the place where you want the new line character. What you suggest is counter intuitive at least to me. Do everybody agree with Steven?
Put on your user hat not developer hat. Think of a person tweaking a value in a config file following some doc. He might not be as savvy in programming as one could think. For such person the whole notion of escaping is counter intuitive. For such person it would be much more logical to put a special symbol at the end of the line indicating that the line should continue on the other line.
The \r -> carriage return is not needed. You either have a new line sequence or not in the buffer you return to the calling application or not. On UNIX it is just one character. On Windows it is 2 (ASCII 13 10)
Steve why you are trying to push that towards an "escaping" solution which is more complex and less intuitive than the "end line indicator" I suggest?
_______________________________________________ sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 04/13/2010 03:59 PM, Dmitri Pal wrote:
Stephen Gallagher wrote:
On 04/13/2010 02:56 PM, Dmitri Pal wrote:
Hi,
First news is that I will spend more time in INI validation code in the nearest future than in ELAPI as it was originally planned. So ELAPI work will be deferred. Decision is made at least for now.
Second is that I came to realization that the internal data representation for the INI collection should change. There is a bunch of data that makes sense to store together with the actual configuration value.
- The line number
- Whether the value was read from the file originally or was added on
the second pass or may be it was automatically generated because other value implies it. 3) In future it might also store some state or other additional information needed for the validation. For example: was the value successfully validated and if not, what was the error.
We do not need to define all the use cases now. But the fact that "just value" is not enough any more is important. So I think of replacing the "value" in the configuration collection with the "value object" that will be able to store mentioned above information and the "value" itself. I think it should be a structure since it is internal and can be easily internally extended on as needed basis. Same is true regarding the interfaces to deal with the object. Since it is going to be an internal object we do not have obligation to keep the interfaces the same.
Agreed, this should be converted to an opaque internal object.
So the interface will have create, destroy, and a bunch of other set and get style methods. Though it is C and not C++ I am still following a pattern of "loose coupling" and creating "facades" rather than letting anyone deal with the structure directly. Hope no objections on that
front.
But as I started looking into the value object I realized that this is a perfect time to introduce or at least think about supporting multi line values in the INI files. I see two use cases that need to be handled in a different way:
- I have a long line that I want to just split between several lines
for readability. key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
In this case the splitting between different lines is just done for readability and the application would expect the value consisting of one buffer with all lines concatenated, new lines removed and NEW LINE indicators removed. In the example above the NEW LINE indicator is a back slash but we will talk about alternative indicatios in more details below.
- I have format where the new lines embedded into the value.
For example PEM format for the certificate expects one buffer that consists of the set of the concatenated lines with NEW LINES symbols at the end of those lines preserved since they are a part of the format. In the ini file it will look like this:
-----BEGIN CERTIFICATE REQUEST----- MIIBnTCCAQYCAQAwXTELMAkGA1UEBhMCU0cxETAPBgNVBAoTCE0yQ3J5cHRvMRIw EAYDVQQDEwlsb2NhbGhvc3QxJzAlBgkqhkiG9w0BCQEWGGFkbWluQHNlcnZlci5l eGFtcGxlLmRvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAr1nYY1Qrll1r uB/FqlCRrr5nvupdIN+3wF7q915tvEQoc74bnu6b8IbbGRMhzdzmvQ4SzFfVEAuM MuTHeybPq5th7YDrTNizKKxOBnqE2KYuX9X22A1Kh49soJJFg6kPb9MUgiZBiMlv tb7K3CHfgw5WagWnLl8Lb+ccvKZZl+8CAwEAAaAAMA0GCSqGSIb3DQEBBAUAA4GB AHpoRp5YS55CZpy+wdigQEwjL/wSluvo+WjtpvP0YoBMJu4VMKeZi405R7o8oEwi PdlrrliKNknFmHKIaCKTLRcU59ScA6ADEIWUzqmUzP5Cs6jrSRo3NKfg1bd09D1K 9rsQkRc9Urv9mRBIsredGnYECNeRaK5R1yzpOowninXC -----END CERTIFICATE REQUEST-----
So to address both use cases I propose that the INI interface would implement the following logic:
Read a line from the INI file into buffer Label: IF the NEW LINE indicator is present at the end of the buffer THEN IF indicator shows that the NEW LINES should be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) Next line is read and appended to the value. 3) Goto Label ELSE IF indicator shows that the NEW LINES should not be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) In place of the stripped data the new line character is inserted 3) Next line is read and appended to the value. 4) Goto Label ELSE ERROR ENDIF ELSE We are done with this value. ENDIF
Now let us talk about the NEW LINE indicator. I think of it as a sequence of characters that indicate that we have a multi-line value that either should have or should not have the new line symbol as part of the resulting concatenated string. It can be a symbol, series of symbols or a pattern.
The most logical and most convenient, as it seems to me, (and this is where I heard some resistance) would be the following patterns:
a) New line indicator that does not preserve new line is a sequence of a back slash and any spaces or tabs after it. Example (notice that there are spaces after slash):
key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
b) New line indicator that preserves new line is a sequence of a back slash and symbol 'n' and any spaces or tabs after it.
Example (notice that there are spaces after 'n' ):
key = my long multi line value with \n the preserved new lines because this \n is the format my application expects.
I heard some concerns that these patterns should not be used the way I propose since some other applications like make allow no spaces after "".
But I do not see how my approach harms? It allows those who made a mistake of putting space after slash not being punished. The spaces at the end are irrelevant so why I should punish the users of the INI interface and applications built on top of it for putting a space that has no meaning and is tripped anyways. May be I should use some other patterns instead so that someone does not confuse with the escaping?
Like:
key = my long multi line value + that I want to split in the ini file + between different lines for readability.
And
key = my long multi line value with & the preserved new lines because this & is the format my application expects.
Comments and suggestions welcome!
Thank you, Dmitri Pal
As I said yesterday, my feeling is that we should follow RFC 822 here (as the python INI parser does).
For lines that are just long and require no special charactes: name=valuevaluevalue continuationafterwhitespace
In this case, it would be read in as: {{{ valuevaluevalue continuationafterwhitespace }}}
RFC 822 requires the parser to do line continuations only at points where the resulting value can accept whitespace. This whitespace is truncated to a single space in the final value.
Frankly I do not like this format. I think it is too limiting. It was used for a special case and now many implementations copied it. But I really do not see a compelling reason to do it the same way. What if you have a long base 64 encoded string that you want to use? What spaces you are talking about? IMO it is just the wrong model to follow in the first place. Also as far as I understood Nalin for the example I gave here the cert format includes the new line characters in the output buffer. This is not something that 822 specifies but this is what PEM expects. PEM expects no spaces but new lines while as you mentioned 822 talks about single whitespace but not new line.
Now, if we want to include a value that contains newlines, it should be done as follows:
name=valuevaluevalue\ncontinueafternewline continueafterspace
which would result in the string: {{{ valuevaluevalue continueafternewline continueafterspace }}}
The parser would have to handle the following escape characters: \n -> newline \r -> carriage return \ -> literal backslash
I don't think there's any value in handling any other escapes, but others may disagree.
This is totally wrong IMO. You are trying to introduce escape sequences and I do not want to do this. Too much for nothing. The place where you cut the value is the place where you want the new line character. What you suggest is counter intuitive at least to me. Do everybody agree with Steven?
Put on your user hat not developer hat. Think of a person tweaking a value in a config file following some doc. He might not be as savvy in programming as one could think. For such person the whole notion of escaping is counter intuitive. For such person it would be much more logical to put a special symbol at the end of the line indicating that the line should continue on the other line.
See below.
The \r -> carriage return is not needed. You either have a new line sequence or not in the buffer you return to the calling application or not. On UNIX it is just one character. On Windows it is 2 (ASCII 13 10)
I have tried to express this to you yesterday. Any data encoded in this way needs to be an EXACT representation. It is completely wrong to have the parser try to perform any platform-specific conversion of these values.
Steve why you are trying to push that towards an "escaping" solution which is more complex and less intuitive than the "end line indicator" I suggest?
Well, first and foremost: using an INI file to store extensive binary data is just plain wrong. It was the wrong approach for using as a database for certmonger.
Long, multi-line values are the exception, not the rule, for INI files. They should be supported only as a convenience for a few cases where they are unavoidable.
Putting on my user hat: If I ever have to enter a multi-line value into a config file, then the consuming application is Doing It Wrong.
- -- Stephen Gallagher RHCE 804006346421761
Delivering value year after year. Red Hat ranks #1 in value among software vendors. http://www.redhat.com/promo/vendor/
On 04/13/2010 02:56 PM, Dmitri Pal wrote:
Hi,
First news is that I will spend more time in INI validation code in the nearest future than in ELAPI as it was originally planned. So ELAPI work will be deferred. Decision is made at least for now.
Second is that I came to realization that the internal data representation for the INI collection should change. There is a bunch of data that makes sense to store together with the actual configuration value.
- The line number
- Whether the value was read from the file originally or was added on
the second pass or may be it was automatically generated because other value implies it. 3) In future it might also store some state or other additional information needed for the validation. For example: was the value successfully validated and if not, what was the error.
We do not need to define all the use cases now. But the fact that "just value" is not enough any more is important. So I think of replacing the "value" in the configuration collection with the "value object" that will be able to store mentioned above information and the "value" itself. I think it should be a structure since it is internal and can be easily internally extended on as needed basis. Same is true regarding the interfaces to deal with the object. Since it is going to be an internal object we do not have obligation to keep the interfaces the same.
So the interface will have create, destroy, and a bunch of other set and get style methods. Though it is C and not C++ I am still following a pattern of "loose coupling" and creating "facades" rather than letting anyone deal with the structure directly. Hope no objections on that front.
But as I started looking into the value object I realized that this is a perfect time to introduce or at least think about supporting multi line values in the INI files. I see two use cases that need to be handled in a different way:
- I have a long line that I want to just split between several lines
for readability. key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
In this case the splitting between different lines is just done for readability and the application would expect the value consisting of one buffer with all lines concatenated, new lines removed and NEW LINE indicators removed. In the example above the NEW LINE indicator is a back slash but we will talk about alternative indicatios in more details below.
- I have format where the new lines embedded into the value.
For example PEM format for the certificate expects one buffer that consists of the set of the concatenated lines with NEW LINES symbols at the end of those lines preserved since they are a part of the format. In the ini file it will look like this:
-----BEGIN CERTIFICATE REQUEST----- MIIBnTCCAQYCAQAwXTELMAkGA1UEBhMCU0cxETAPBgNVBAoTCE0yQ3J5cHRvMRIw EAYDVQQDEwlsb2NhbGhvc3QxJzAlBgkqhkiG9w0BCQEWGGFkbWluQHNlcnZlci5l eGFtcGxlLmRvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAr1nYY1Qrll1r uB/FqlCRrr5nvupdIN+3wF7q915tvEQoc74bnu6b8IbbGRMhzdzmvQ4SzFfVEAuM MuTHeybPq5th7YDrTNizKKxOBnqE2KYuX9X22A1Kh49soJJFg6kPb9MUgiZBiMlv tb7K3CHfgw5WagWnLl8Lb+ccvKZZl+8CAwEAAaAAMA0GCSqGSIb3DQEBBAUAA4GB AHpoRp5YS55CZpy+wdigQEwjL/wSluvo+WjtpvP0YoBMJu4VMKeZi405R7o8oEwi PdlrrliKNknFmHKIaCKTLRcU59ScA6ADEIWUzqmUzP5Cs6jrSRo3NKfg1bd09D1K 9rsQkRc9Urv9mRBIsredGnYECNeRaK5R1yzpOowninXC -----END CERTIFICATE REQUEST-----
So to address both use cases I propose that the INI interface would implement the following logic:
Read a line from the INI file into buffer Label: IF the NEW LINE indicator is present at the end of the buffer THEN IF indicator shows that the NEW LINES should be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) Next line is read and appended to the value. 3) Goto Label ELSE IF indicator shows that the NEW LINES should not be stripped THEN 1) The indicator is stripped and end of line character(s) are stripped 2) In place of the stripped data the new line character is inserted 3) Next line is read and appended to the value. 4) Goto Label ELSE ERROR ENDIF ELSE We are done with this value. ENDIF
Now let us talk about the NEW LINE indicator. I think of it as a sequence of characters that indicate that we have a multi-line value that either should have or should not have the new line symbol as part of the resulting concatenated string. It can be a symbol, series of symbols or a pattern.
The most logical and most convenient, as it seems to me, (and this is where I heard some resistance) would be the following patterns:
a) New line indicator that does not preserve new line is a sequence of a back slash and any spaces or tabs after it. Example (notice that there are spaces after slash):
key = my long multi line value \ that I want to split in the ini file \ between different lines for readability.
b) New line indicator that preserves new line is a sequence of a back slash and symbol 'n' and any spaces or tabs after it.
Example (notice that there are spaces after 'n' ):
key = my long multi line value with \n the preserved new lines because this \n is the format my application expects.
I heard some concerns that these patterns should not be used the way I propose since some other applications like make allow no spaces after "".
But I do not see how my approach harms? It allows those who made a mistake of putting space after slash not being punished. The spaces at the end are irrelevant so why I should punish the users of the INI interface and applications built on top of it for putting a space that has no meaning and is tripped anyways. May be I should use some other patterns instead so that someone does not confuse with the escaping?
Like:
key = my long multi line value + that I want to split in the ini file + between different lines for readability.
And
key = my long multi line value with& the preserved new lines because this& is the format my application expects.
Comments and suggestions welcome!
None of the complexity discussed above is necessary if you support quoted strings. A double quote introduces a quoted string. You read until you find the closing quote (skipping any escaped quotes, e.g. "). The entire quoted string (including the leading and trailing quotes) are preserved as part of the attribute's value. Every character is captured verbatim (including the delimiting quotes).
Thus in the case of PEM data the application would only need to strip the leading and trailing quotes and it would have the exact textual data needed. This would also allow for easy cut-n-paste.
The leading and trailing quotes need to be preserved so that later when the attribute values read from the ini file need to be interpreted by the application it can see the quoted string and optionally decide how to parse the string (possibly escaping backslash sequences).
Thus for example:
[silly section] my_attr = 1 " some text " true
would see the value of my_attr exactly as written above, it would likely parse that into 3 tokens (integer, string, boolean), but tokenizing would be done by the application, not the ini parser.
[snip]
None of the complexity discussed above is necessary if you support quoted strings. A double quote introduces a quoted string. You read until you find the closing quote (skipping any escaped quotes, e.g. "). The entire quoted string (including the leading and trailing quotes) are preserved as part of the attribute's value. Every character is captured verbatim (including the delimiting quotes).
Thus in the case of PEM data the application would only need to strip the leading and trailing quotes and it would have the exact textual data needed. This would also allow for easy cut-n-paste.
The leading and trailing quotes need to be preserved so that later when the attribute values read from the ini file need to be interpreted by the application it can see the quoted string and optionally decide how to parse the string (possibly escaping backslash sequences).
Thus for example:
[silly section] my_attr = 1 " some text " true
would see the value of my_attr exactly as written above, it would likely parse that into 3 tokens (integer, string, boolean), but tokenizing would be done by the application, not the ini parser.
I had a long discussion with John about this at the end of the day. I think that we managed to come to some other consensus. If I got it right the rules of thumb are the following: a) Do not invent new meaning for the characters that are already used to have some other special meaning. Be consistent with use and meaning of the characters. b) Do not invent new syntax. Pick one of the exiting ones. c) It is Ok to define rules about the syntax of the INI file values as long as they do not contradict the first two rules d) It is Ok to not support something if it is commented and declared in the documentation (with intent or without intent to support it in future)
I think I captured our discussion reasonably well. Now about John's proposal. I like it mostly with couple tweaks.
Here is what I suggest should be implemented (note: <NL> is a new line character read from the file at the end of the line): 1) <NL> should be special sequence that is always used to indicate line wrapping. All <NL> sequences are processed at the time the data is read from the file. Example:
my_val = Many words \ do not fit on one line!
Will be read into the memory and will be stored in the memory of the config object as ->Many words do not fit on one line!<- This is the syntax that people use in make files and in C and expect the <NL> to be thrown away and sequential strings concatenated. The following value
my_val = This value has \ in the \ middle of the line.
Will be read into the memory and will be stored in the memory of the config object as ->This value has \ in the middle of the line.<-
2) If someone wants to preserve the formatting i.e. spaces tabs or new lines he should (as it was suggested by John) put the value into the double quotes. But I had a dilemma. To read the quoted string verbatim and interpret when the application asks for the value or do preprocessing of the value right at the moment the data is read from the file. After long mental exercise and comparing pros and cons of different approaches I would think that the following logic would be best: a) Use the quotes to define a sequence of the characters that needs to be read verbatim. This sequence can cross multiple lines. b) The quotes inside this sequence should be escaped with the back slash c) <NL> substitution rule ignores works inside the quoted string too d) Non escaped <NL>s are copied verbatim e) No support for any other escape sequence as \n \r \t \ at least at the moment.
Example:
my_val = " Preserve "leading" spaces and embed new lines in the middle of the line."
This will be read and stored in the memory verbatim: ->" Preserve "leading" spaces<NL>and embed new lines in<NL>the middle of the line."<-
When the interpreting function is asked to translate it as an "escaped" string the following string will be returned to the application: -> Preserve "leading" spaces<NL>and embed new lines in<NL>the middle of the line.<-
Here is a more interesting example:
my_val = First split \ sentence, " "Second" \ split sentence" , Th" i "rd \ split \ sentence
This whole sequence will be interpreted as one value and will be stored in the memory as: ->First split sentence, " "Second" split<NL> sentence" , Th" i "rd split sentence<-
When the application would ask the INI interface to give it the value as one escaped string the result will be: ->First split sentence, "Second" split<NL> sentence , Th i rd split sentence<- (note that there are two spaces after the first comma)
If instead application would try to interpret the value as an array using comma as a separator the result will consist of the array of the tree strings: ->First split sentence<- -> "Second" split<NL> sentence<- (note a leading space) ->Th i rd split sentence<-
I think this is sufficient functionality at the moment. In future we might decide that we will support other escape characters. To preserve backward future compatibility the interpreting functions would have a special flag that would indicate the level of the escaping the application is expecting the library to support. Let me illustrate it on the example. Say that we implement the rules described above (this is version or level 1) but then later add capability to substitute \n (this is version or level 2).
Say the config file contained the following line:
my_val = Line with \n in the middle.
In version 1 the \n is ignored. In version 2 we can substitute \n with <NL>. But if we just update the library under application from version 1 to version 2 and start interpreting \n ourselves that would not be what the application would want.
So application will pass a flag that would indicate the level of the parsing it expects. So if the application was built using v1 escaping rules it would pass v1 flag to the interpretation function: get_esc_str_value(..., USE_V1_ESCAPING); Then if the library is updated to later version the application would not be affected by the new escaping rules until application actually decides to support it. In this case a new version would use other, later, flag and would migrate its config values to a new format.
But IMO I do not think the INI should be in the business of dealing with escape symbols beyond what is currently proposed. However the approach allows gradual future extension without loosing backward compatibility.
And by the way no it is not too complex. The complex thing is one you do not know how to do. This one is not trivial but at least I understand how it can be implemented.
On Tue, Apr 13, 2010 at 9:05 PM, Dmitri Pal dpal@redhat.com wrote:
[snip]
None of the complexity discussed above is necessary if you support quoted strings. A double quote introduces a quoted string. You read until you find the closing quote (skipping any escaped quotes, e.g. "). The entire quoted string (including the leading and trailing quotes) are preserved as part of the attribute's value. Every character is captured verbatim (including the delimiting quotes).
Thus in the case of PEM data the application would only need to strip the leading and trailing quotes and it would have the exact textual data needed. This would also allow for easy cut-n-paste.
The leading and trailing quotes need to be preserved so that later when the attribute values read from the ini file need to be interpreted by the application it can see the quoted string and optionally decide how to parse the string (possibly escaping backslash sequences).
Thus for example:
[silly section] my_attr = 1 " some text " true
would see the value of my_attr exactly as written above, it would likely parse that into 3 tokens (integer, string, boolean), but tokenizing would be done by the application, not the ini parser.
I had a long discussion with John about this at the end of the day. I think that we managed to come to some other consensus. If I got it right the rules of thumb are the following: a) Do not invent new meaning for the characters that are already used to have some other special meaning. Be consistent with use and meaning of the characters. b) Do not invent new syntax. Pick one of the exiting ones.
One of which existing ones? Are you going to be using ini such as RFC822 or are you making up something new? NIH is in general a bad policy.
c) It is Ok to define rules about the syntax of the INI file values as long as they do not contradict the first two rules d) It is Ok to not support something if it is commented and declared in the documentation (with intent or without intent to support it in future)
I think I captured our discussion reasonably well. Now about John's proposal. I like it mostly with couple tweaks.
Here is what I suggest should be implemented (note: <NL> is a new line character read from the file at the end of the line):
- <NL> should be special sequence that is always used to indicate line
wrapping. All <NL> sequences are processed at the time the data is read from the file. Example:
my_val = Many words \ do not fit on one line!
Will be read into the memory and will be stored in the memory of the config object as ->Many words do not fit on one line!<- This is the syntax that people use in make files and in C and expect the <NL> to be thrown away and sequential strings concatenated. The following value
my_val = This value has \ in the \ middle of the line.
Will be read into the memory and will be stored in the memory of the config object as ->This value has \ in the middle of the line.<-
- If someone wants to preserve the formatting i.e. spaces tabs or new
lines he should (as it was suggested by John) put the value into the double quotes. But I had a dilemma. To read the quoted string verbatim and interpret when the application asks for the value or do preprocessing of the value right at the moment the data is read from the file. After long mental exercise and comparing pros and cons of different approaches I would think that the following logic would be best: a) Use the quotes to define a sequence of the characters that needs to be read verbatim. This sequence can cross multiple lines. b) The quotes inside this sequence should be escaped with the back slash c) <NL> substitution rule ignores works inside the quoted string too d) Non escaped <NL>s are copied verbatim e) No support for any other escape sequence as \n \r \t \ at least at the moment.
Example:
my_val = " Preserve "leading" spaces and embed new lines in the middle of the line."
This will be read and stored in the memory verbatim: ->" Preserve "leading" spaces<NL>and embed new lines in<NL>the middle of the line."<-
When the interpreting function is asked to translate it as an "escaped" string the following string will be returned to the application: -> Preserve "leading" spaces<NL>and embed new lines in<NL>the middle of the line.<-
Here is a more interesting example:
my_val = First split \ sentence, " "Second" \ split sentence" , Th" i "rd \ split \ sentence
This whole sequence will be interpreted as one value and will be stored in the memory as: ->First split sentence, " "Second" split<NL> sentence" , Th" i "rd split sentence<-
When the application would ask the INI interface to give it the value as one escaped string the result will be: ->First split sentence, "Second" split<NL> sentence , Th i rd split sentence<- (note that there are two spaces after the first comma)
If instead application would try to interpret the value as an array using comma as a separator the result will consist of the array of the tree strings: ->First split sentence<- -> "Second" split<NL> sentence<- (note a leading space) ->Th i rd split sentence<-
I think this is sufficient functionality at the moment. In future we might decide that we will support other escape characters. To preserve backward future compatibility the interpreting functions would have a special flag that would indicate the level of the escaping the application is expecting the library to support. Let me illustrate it on the example. Say that we implement the rules described above (this is version or level 1) but then later add capability to substitute \n (this is version or level 2).
Say the config file contained the following line:
my_val = Line with \n in the middle.
In version 1 the \n is ignored. In version 2 we can substitute \n with <NL>. But if we just update the library under application from version 1 to version 2 and start interpreting \n ourselves that would not be what the application would want.
So application will pass a flag that would indicate the level of the parsing it expects. So if the application was built using v1 escaping rules it would pass v1 flag to the interpretation function: get_esc_str_value(..., USE_V1_ESCAPING); Then if the library is updated to later version the application would not be affected by the new escaping rules until application actually decides to support it. In this case a new version would use other, later, flag and would migrate its config values to a new format.
But IMO I do not think the INI should be in the business of dealing with escape symbols beyond what is currently proposed. However the approach allows gradual future extension without loosing backward compatibility.
And by the way no it is not too complex. The complex thing is one you do not know how to do. This one is not trivial but at least I understand how it can be implemented.
-- Thank you, Dmitri Pal
Engineering Manager IPA project, Red Hat Inc.
Looking to carve out IT costs? www.redhat.com/carveoutcosts/
sssd-devel mailing list sssd-devel@lists.fedorahosted.org https://fedorahosted.org/mailman/listinfo/sssd-devel
Jeff Schroeder wrote:
On Tue, Apr 13, 2010 at 9:05 PM, Dmitri Pal dpal@redhat.com wrote:
[snip]
I had a long discussion with John about this at the end of the day. I think that we managed to come to some other consensus. If I got it right the rules of thumb are the following: a) Do not invent new meaning for the characters that are already used to have some other special meaning. Be consistent with use and meaning of the characters. b) Do not invent new syntax. Pick one of the exiting ones.
One of which existing ones? Are you going to be using ini such as RFC822 or are you making up something new? NIH is in general a bad policy.
RFC822 is limiting, it does not allow for a way to express the use cases that I want to support: a) Folding lines without preserving <NL> b) Folding lines with preserving <NL>
I am looking for something that would allow me to do this. If you know how to express these two cases using rules defined in 822 I am all ears. If not I am looking for some solution that would be least confusing. I am not aware of any other RFC. If you do please point me to the right direction.
So this effort is to try to identify a convention that seems logical and intuitive to use and can handle these two use cases.
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
This is not an appropriate use-case for the INI file format, and is not an example to follow. For storing binary data, there are plenty of more sensible formats, such as yaml. In the particular case of certmonger, an SQLite database or other small embedded DB would be a much better idea. This is not my project, so it's not my concern.
We need to establish what the goals are for an INI file. In the vast majority of cases (including the SSSD), they are used as a configuration file for a program. Configuration files are expected to set site-specific values for the execution of a program. Storage of data should be relegated to a file format suitable for data storage.
So, it follows that for libini_CONFIG, we should only expect to reasonably store configuration data. So here's my proposal for what we should do:
1) First and foremost: as far as libini_config should be concerned, ALL values should be treated as a single buffer. We should not offer to handle escapes at all. This means that if the literal characters "\n" appear in the string, they should be passed to the INI consumer as two characters: a backslash followed by the letter n.
If the consumer of libini_config wishes to convert the string they got back, it is up to them. We should only be in the business of returning the exact contents of the value.
2) Leading whitespace should be ignored on all lines. This means that name = value should result in the returned string "value" with no exceptions. If for some reason the consumer of the interface needs leading whitespace, it should be up to them to determine a reasonable way to denote this in their source code. They might decide to quote the whole string: name = " value" or they might decide to represent leading spaces with an escape of their design: name = %s%s%svalue We would return these literal strings as " value" and %s%s%svalue respectively
3) Newlines should be handled as described in RFC822. In other words, there need be no backslash escape denoting the end of the line. If the libini_config parser detects a newline, it should check the first character of the next line. If this character is a space or tab, the parser should insert exactly one space in the resulting string.
name = value continued value continued value2 <tab>continued value3
Would be returned to the caller as: value continued value continued value2 continued value3
If the caller wanted to ensure that there were actual newlines and/or exact spaces, then they would be required to add their own escapes and post-process the return from libini_config.
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
+1 This should simple be a pointer to a file containing the binary data. Why is data being stored in an INI file?
This is not an appropriate use-case for the INI file format, and is not an example to follow. For storing binary data, there are plenty of more sensible formats, such as yaml. In the particular case of certmonger, an SQLite database or other small embedded DB would be a much better idea. This is not my project, so it's not my concern.
We need to establish what the goals are for an INI file. In the vast majority of cases (including the SSSD), they are used as a configuration file for a program. Configuration files are expected to set site-specific values for the execution of a program. Storage of data should be relegated to a file format suitable for data storage.
So, it follows that for libini_CONFIG, we should only expect to reasonably store configuration data. So here's my proposal for what we should do:
- First and foremost: as far as libini_config should be concerned, ALL
values should be treated as a single buffer. We should not offer to handle escapes at all. This means that if the literal characters "\n" appear in the string, they should be passed to the INI consumer as two characters: a backslash followed by the letter n.
If the consumer of libini_config wishes to convert the string they got back, it is up to them. We should only be in the business of returning the exact contents of the value.
+1 We should be returning the exact contents and any escaping should be up to what is reading and using the contents. I believe we are trying to make this way to complicated. It is a simple INI file. IMHO
- Leading whitespace should be ignored on all lines. This means that
name = value should result in the returned string "value" with no exceptions. If for some reason the consumer of the interface needs leading whitespace, it should be up to them to determine a reasonable way to denote this in their source code. They might decide to quote the whole string: name = " value" or they might decide to represent leading spaces with an escape of their design: name = %s%s%svalue We would return these literal strings as " value" and %s%s%svalue respectively
- Newlines should be handled as described in RFC822. In other words,
there need be no backslash escape denoting the end of the line. If the libini_config parser detects a newline, it should check the first character of the next line. If this character is a space or tab, the parser should insert exactly one space in the resulting string.
name = value continued value continued value2 <tab>continued value3
Would be returned to the caller as: value continued value continued value2 continued value3
If the caller wanted to ensure that there were actual newlines and/or exact spaces, then they would be required to add their own escapes and post-process the return from libini_config.
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
On Thu, Apr 15, 2010 at 6:23 AM, Stephen Gallagher sgallagh@redhat.com wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
This is not an appropriate use-case for the INI file format, and is not an example to follow. For storing binary data, there are plenty of more sensible formats, such as yaml. In the particular case of certmonger, an SQLite database or other small embedded DB would be a much better idea. This is not my project, so it's not my concern.
We need to establish what the goals are for an INI file. In the vast majority of cases (including the SSSD), they are used as a configuration file for a program. Configuration files are expected to set site-specific values for the execution of a program. Storage of data should be relegated to a file format suitable for data storage.
So, it follows that for libini_CONFIG, we should only expect to reasonably store configuration data. So here's my proposal for what we should do:
- First and foremost: as far as libini_config should be concerned, ALL
values should be treated as a single buffer. We should not offer to handle escapes at all. This means that if the literal characters "\n" appear in the string, they should be passed to the INI consumer as two characters: a backslash followed by the letter n.
If the consumer of libini_config wishes to convert the string they got back, it is up to them. We should only be in the business of returning the exact contents of the value.
- Leading whitespace should be ignored on all lines. This means that
name = value should result in the returned string "value" with no exceptions. If for some reason the consumer of the interface needs leading whitespace, it should be up to them to determine a reasonable way to denote this in their source code. They might decide to quote the whole string: name = " value" or they might decide to represent leading spaces with an escape of their design: name = %s%s%svalue We would return these literal strings as " value" and %s%s%svalue respectively
- Newlines should be handled as described in RFC822. In other words,
there need be no backslash escape denoting the end of the line. If the libini_config parser detects a newline, it should check the first character of the next line. If this character is a space or tab, the parser should insert exactly one space in the resulting string.
name = value continued value continued value2 <tab>continued value3
Would be returned to the caller as: value continued value continued value2 continued value3
If the caller wanted to ensure that there were actual newlines and/or exact spaces, then they would be required to add their own escapes and post-process the return from libini_config.
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
-- Stephen Gallagher RHCE 804006346421761
You nailed my thoughts exactly Stephen. Very well said.
On Thu, 2010-04-15 at 09:23 -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
...
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Not that I am really part of the SSSD project, but +1 from me anyway. Very sensible reasoning.
On 04/15/2010 11:07 AM, Tomas Mraz wrote:
On Thu, 2010-04-15 at 09:23 -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
...
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Not that I am really part of the SSSD project, but +1 from me anyway. Very sensible reasoning.
Actually, input from outside our project is probably more valuable. We intend for libini_config to be available for use by any project that wants it.
On Thu, Apr 15, 2010 at 8:09 AM, Stephen Gallagher sgallagh@redhat.com wrote:
On 04/15/2010 11:07 AM, Tomas Mraz wrote:
On Thu, 2010-04-15 at 09:23 -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
...
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Not that I am really part of the SSSD project, but +1 from me anyway. Very sensible reasoning.
Actually, input from outside our project is probably more valuable. We intend for libini_config to be available for use by any project that wants it.
Alright well corner cases and "magic" handling of multiline stuff seems wrong. Be dumb about it and let the caller of your library figure some of that out. If someone wants to store binary data in an ini file, let them. That seems slightly insane, but it is their app. That doesn't mean you should hack up your library because your users want to do stupid things.
Jeff Schroeder wrote:
On Thu, Apr 15, 2010 at 8:09 AM, Stephen Gallagher sgallagh@redhat.com wrote:
On 04/15/2010 11:07 AM, Tomas Mraz wrote:
On Thu, 2010-04-15 at 09:23 -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
...
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Not that I am really part of the SSSD project, but +1 from me anyway. Very sensible reasoning.
Actually, input from outside our project is probably more valuable. We intend for libini_config to be available for use by any project that wants it.
Alright well corner cases and "magic" handling of multiline stuff seems wrong. Be dumb about it and let the caller of your library figure some of that out. If someone wants to store binary data in an ini file, let them. That seems slightly insane, but it is their app. That doesn't mean you should hack up your library because your users want to do stupid things.
Well I generally disagree with the approach. I think that if I sell a gun it should be suitable for shooting yourself too (big time). But OK. This discussions boils down to "be dumb and not handle more than you need to" and everybody seems to agree with this. Fine, it will be what Steve suggests then: no escapes and RFC822.
I still feel that we should be nicer to "crazy" people and let them "abuse" what we offer and offer more convenient options with which they can shoot themselves in all sorts of different ways. :-)
But I will comply. Thank you for your time and input!
On Thu, Apr 15, 2010 at 9:00 AM, Dmitri Pal dpal@redhat.com wrote:
Jeff Schroeder wrote:
On Thu, Apr 15, 2010 at 8:09 AM, Stephen Gallagher sgallagh@redhat.com wrote:
On 04/15/2010 11:07 AM, Tomas Mraz wrote:
On Thu, 2010-04-15 at 09:23 -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
...
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Not that I am really part of the SSSD project, but +1 from me anyway. Very sensible reasoning.
Actually, input from outside our project is probably more valuable. We intend for libini_config to be available for use by any project that wants it.
Alright well corner cases and "magic" handling of multiline stuff seems wrong. Be dumb about it and let the caller of your library figure some of that out. If someone wants to store binary data in an ini file, let them. That seems slightly insane, but it is their app. That doesn't mean you should hack up your library because your users want to do stupid things.
Well I generally disagree with the approach. I think that if I sell a gun it should be suitable for shooting yourself too (big time). But OK. This discussions boils down to "be dumb and not handle more than you need to" and everybody seems to agree with this. Fine, it will be what Steve suggests then: no escapes and RFC822.
I still feel that we should be nicer to "crazy" people and let them "abuse" what we offer and offer more convenient options with which they can shoot themselves in all sorts of different ways. :-)
But I will comply. Thank you for your time and input!
It is just a difference of design philosophy. Much like in the ruby community Ruby On Rails includes everything and tries to be magical which makes it slow vs the merb approach of small, light, and fast. Neither are wrong but merely different.
Thanks for being pragmatic and please keep on writing awesome software. As a user I can't thank you enough.
Jeff Schroeder wrote:
On Thu, Apr 15, 2010 at 9:00 AM, Dmitri Pal dpal@redhat.com wrote:
Jeff Schroeder wrote:
On Thu, Apr 15, 2010 at 8:09 AM, Stephen Gallagher sgallagh@redhat.com wrote:
On 04/15/2010 11:07 AM, Tomas Mraz wrote:
On Thu, 2010-04-15 at 09:23 -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
...
In conclusion: I think attempting to solve the newline/escape problem in libini_config itself is both too complex and too limiting. It is more complexity being added to the libini_config than a simple INI file parser needs to have, and by implementing any particular approach to managing the data, we are limiting is usability to only those use-cases that we can come up with.
Not that I am really part of the SSSD project, but +1 from me anyway. Very sensible reasoning.
Actually, input from outside our project is probably more valuable. We intend for libini_config to be available for use by any project that wants it.
Alright well corner cases and "magic" handling of multiline stuff seems wrong. Be dumb about it and let the caller of your library figure some of that out. If someone wants to store binary data in an ini file, let them. That seems slightly insane, but it is their app. That doesn't mean you should hack up your library because your users want to do stupid things.
Well I generally disagree with the approach. I think that if I sell a gun it should be suitable for shooting yourself too (big time). But OK. This discussions boils down to "be dumb and not handle more than you need to" and everybody seems to agree with this. Fine, it will be what Steve suggests then: no escapes and RFC822.
I still feel that we should be nicer to "crazy" people and let them "abuse" what we offer and offer more convenient options with which they can shoot themselves in all sorts of different ways. :-)
But I will comply. Thank you for your time and input!
It is just a difference of design philosophy. Much like in the ruby community Ruby On Rails includes everything and tries to be magical which makes it slow vs the merb approach of small, light, and fast. Neither are wrong but merely different.
Thanks for being pragmatic and please keep on writing awesome software. As a user I can't thank you enough.
Thanks for the warm words. You really made my day!
On Thu, Apr 15, 2010 at 09:23:36AM -0400, Stephen Gallagher wrote:
Can we please come at this issue from the right direction? The reason that this is even being debated is because we learned of another program (certmonger) that is using INI files to store binary data.
THIS IS WRONG.
At this point I feel compelled to note that the data file is not an INI file. It looks vaguely like an INI file because tagging the data stored therein with a field name makes it easier to add and remove fields from the data file as functionality gets added. There are no sections. Earlier versions didn't even bother tagging the data.
This is not an appropriate use-case for the INI file format, and is not an example to follow. For storing binary data, there are plenty of more sensible formats, such as yaml. In the particular case of certmonger, an SQLite database or other small embedded DB would be a much better idea. This is not my project, so it's not my concern.
I'll elaborate.
One of the main goals was (and is) to avoid dragging new dependencies onto a system beyond what sssd already pulls in (this is the main reason it uses tevent rather than glib, despite my being more familiar with glib at the time -- xmlrpc-c came later). I recall we talked about LDB at the time, but we didn't have usable packages of it in Raw Hide that week.
I'm happy to entertain conversations about other storage methods (it's been listed as a possible to-do item for _months_), but I'd rather do that on the certmonger list. (I know, nobody's posted there yet, but hey, that means you can even use "First!" as a subject.)
Cheers,
Nalin
sssd-devel@lists.fedorahosted.org