Li, Cindy cli at ocad.ca
Thu Jun 2 19:02:26 UTC 2011

Hi Antranig,

Following up yesterday's dev meeting, I'm helping with testing the regex that was modified in this jira.

What I've done:

1. I came up with a string that contains all the keyboard characters and letters in lower & upper case
2. input the string into inline edit fields of these demos:


3. ensure none of the characters in the string was lost during edit, save, undo, redo.

So far so good.

Do you think this test is decent enough? Anything else I should test? Thanks.

From: Antranig Basman [antranig.basman at colorado.edu]
Sent: 31 May 2011 21:02
To: Fluid Work; Li, Cindy
Subject: FLUID-4132

Hi there - Justin has asked me to review the pull branch at
https://github.com/fluid-project/infusion/pull/22 - in particular the regexp which on line 67 of
InlineEditIntegrations.js has been changed from /\<(\S+)[^\>\s]*\>/g to /\<([a-z0-9A-Z\/]+)\>/g to head off
a JSLint error.

 From what I can see, the error JSLint is reporting is

Problem at line 67 character 38: Insecure '^'.

This seems like a pretty subtle issue... hunting around, I find this fairly clear explanation at
stackOverflow of what Crockford's intent might have been when issuing this error.


I can think of a few ways of approaching this issue...

Firstly, I think an important angle is that this regular expression is being used for canonicalization,
rather than for validation. The purpose of the "normalizeHTML" function is to fold different expressions of
essentially the same markup into a canonical string form, so that it is possible to tell whether, by
operating the rich text control, the user really made a substantial change to the markup or not, by simply
comparing the mapped string for equality. Or more accurately, the user experience we want is NOT for the
user to open the rich text control, issue a "save" without making any changes whatsoever, but for the
control to have considered there has been a change.

So - it is not necessary for the canonicalizer to be particularly "aggressive", or accurate - but we need to
make sure that there are no false positives of the kind just described.

Carrying on, the particular regexp that was put there was not terribly "well-attested" - this falls into
that grey area of functionality that is very hard to address with unit tests since it involves actually
taking the particular rich text control in question through its lifecycle and seeing what happens. That
said, it is not certainly impossible to construct a test for this function as it is for focus/blur issues,
just probably rather difficult. The "acid test" for this regexp consists of taking our "demo" page for the
rich text control through the cycle of triggering edit, saving immediately, and seeing if the "undo" widget
appears or not.

Some further things we could say about the regexp - if its domain were intended to be all of XML, it would
be no good - http://www.w3.org/TR/2004/REC-xml11-20040204/ - there are numerous characters in "NameChar" and
"NameStartChar" that aren't covered. However, given it is only intended to operate on the standard dialect
of HTML tags, it is probably good enough - http://www.w3.org/TR/html401/sgml/dtd.html .

Another thing to say about the regexp.... it looks like it is trying to "do that thing which should never be
done", which is, "to parse HTML using a regexp" -

All we can say here is to fall back on the role of the regexp precisely - just as it is not intended for
"validation", it is also not intended for "parsing" - all we need is "good enough canonicalisation for this
purpose" which I think that Cindy's regexp achieves - so long as it passes the manual test. This concept of
"good enough" does depend on the exact rich text controls that we are supporting - the task may have got
easier since we ditched FCKEditor.

The final thing we could say about the regexp - this dates from an era where our community standards for
these kinds of things were lower than they are now, otherwise the author (in this instance, me) would have
put a comment linking to a JIRA explaining at least some of these issues which are listed above :)

The bottom line - since the regexp is used for canonicalisation rather than validation, I don't think
Crockford's warning needs to be taken seriously - however, we may as well get rid of it by rewriting the
expression if we can, as long as we can verify that the expression is still good for its original purpose by
running manual tests on our integrations demo for the rich text widgets we are intending to support.

More information about the fluid-work mailing list