Antranig Basman antranig.basman at colorado.edu
Thu Jun 2 19:11:45 UTC 2011

Hi Cindy - the purpose of the regex is actually to canonicalise HTML tag names rather than user text, so 
your test should include editing a field which contains an initial value which is some markup. THe issue is 
not so much that of "loss" as of false detection of change - so the acid test is to ensure that a field with 
an ORIGINAL rich text value can be edited, and immediately saved without triggering change detection. The 
tests you have performed are useful too - but those can actually be done in automated tests rather than 
needing to be done manually.


On 02/06/2011 13:02, Li, Cindy wrote:
> Hi Antranig,
> Following up yesterday's dev meeting, I'm helping with testing the regex that was modified in this jira.
> What I've done:
> 1. I came up with a string that contains all the keyboard characters and letters in lower&  upper case
> 2. input the string into inline edit fields of these demos:
> src/webapp/demos/inlineEdit/simple/html/inlineEdit.html
> src/webapp/demos/inlineEdit/rich/html/inlineEdit.html
> 3. ensure none of the characters in the string was lost during edit, save, undo, redo.
> So far so good.
> Do you think this test is decent enough? Anything else I should test? Thanks.
> Cindy
> ________________________________________
> From: Antranig Basman [antranig.basman at colorado.edu]
> Sent: 31 May 2011 21:02
> To: Fluid Work; Li, Cindy
> Subject: FLUID-4132
> Hi there - Justin has asked me to review the pull branch at
> https://github.com/fluid-project/infusion/pull/22 - in particular the regexp which on line 67 of
> InlineEditIntegrations.js has been changed from /\<(\S+)[^\>\s]*\>/g to /\<([a-z0-9A-Z\/]+)\>/g to head off
> a JSLint error.
>   From what I can see, the error JSLint is reporting is
> Problem at line 67 character 38: Insecure '^'.
> This seems like a pretty subtle issue... hunting around, I find this fairly clear explanation at
> stackOverflow of what Crockford's intent might have been when issuing this error.
> http://stackoverflow.com/questions/4109214/jslint-insecure-in-regular-expression
> I can think of a few ways of approaching this issue...
> Firstly, I think an important angle is that this regular expression is being used for canonicalization,
> rather than for validation. The purpose of the "normalizeHTML" function is to fold different expressions of
> essentially the same markup into a canonical string form, so that it is possible to tell whether, by
> operating the rich text control, the user really made a substantial change to the markup or not, by simply
> comparing the mapped string for equality. Or more accurately, the user experience we want is NOT for the
> user to open the rich text control, issue a "save" without making any changes whatsoever, but for the
> control to have considered there has been a change.
> So - it is not necessary for the canonicalizer to be particularly "aggressive", or accurate - but we need to
> make sure that there are no false positives of the kind just described.
> Carrying on, the particular regexp that was put there was not terribly "well-attested" - this falls into
> that grey area of functionality that is very hard to address with unit tests since it involves actually
> taking the particular rich text control in question through its lifecycle and seeing what happens. That
> said, it is not certainly impossible to construct a test for this function as it is for focus/blur issues,
> just probably rather difficult. The "acid test" for this regexp consists of taking our "demo" page for the
> rich text control through the cycle of triggering edit, saving immediately, and seeing if the "undo" widget
> appears or not.
> Some further things we could say about the regexp - if its domain were intended to be all of XML, it would
> be no good - http://www.w3.org/TR/2004/REC-xml11-20040204/ - there are numerous characters in "NameChar" and
> "NameStartChar" that aren't covered. However, given it is only intended to operate on the standard dialect
> of HTML tags, it is probably good enough - http://www.w3.org/TR/html401/sgml/dtd.html .
> Another thing to say about the regexp.... it looks like it is trying to "do that thing which should never be
> done", which is, "to parse HTML using a regexp" -
> http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
> All we can say here is to fall back on the role of the regexp precisely - just as it is not intended for
> "validation", it is also not intended for "parsing" - all we need is "good enough canonicalisation for this
> purpose" which I think that Cindy's regexp achieves - so long as it passes the manual test. This concept of
> "good enough" does depend on the exact rich text controls that we are supporting - the task may have got
> easier since we ditched FCKEditor.
> The final thing we could say about the regexp - this dates from an era where our community standards for
> these kinds of things were lower than they are now, otherwise the author (in this instance, me) would have
> put a comment linking to a JIRA explaining at least some of these issues which are listed above :)
> =========
> The bottom line - since the regexp is used for canonicalisation rather than validation, I don't think
> Crockford's warning needs to be taken seriously - however, we may as well get rid of it by rewriting the
> expression if we can, as long as we can verify that the expression is still good for its original purpose by
> running manual tests on our integrations demo for the rich text widgets we are intending to support.

More information about the fluid-work mailing list