Kettle and the Last Thread-Local
Antranig Basman
antranig.basman at colorado.edu
Wed Aug 28 05:59:30 UTC 2013
Those not interested in these technical issues can breeze on by, but I wanted to summarise our thinking of
today whilst it was still fresh in my mind.
Summary is at the top this time, having learned from my 2011 ramble:
1. In theory in a "pure architecture", we would be able to remove all ThreadLocals from client code (Kettle)
as well as from the core framework (Infusion)
2. In practice, because of error handling and debuggability concerns, we can't.
Last week I assigned Yura to http://issues.fluidproject.org/browse/KETTLE-16 , a simplification of Kettle's
infrastructure that seemed possible following improvements in Infusion. In the improvements described in
FLUID-4330 and others, I was successively able to remove uses of "ThreadLocals" wherever they occurred in
our base framework, as a result of the better record-keeping that allowed us to associate components with
the original instantiators. This simple lookup is now performed by the utility fluid.getInstantiator -
https://github.com/fluid-project/infusion/blob/master/src/framework/core/js/FluidIoC.js#L749
This is a framework internal and it is a top goal that all users of the framework should be able to get
their work done without reference to it.
Before going any further I should give some notes about the use of the term "ThreadLocal", which in an
environment without any threads, such as the browser and node.js, seems confusing and paradoxical. Certainly
in some hypothetical environments with threads, for example our experimental early version of Kettle from
2008 based on Java Servlets, and the Rhino JavaScript engine, this term was perfectly standard and accurate.
In an environment with just one thread, a thing with a remnant of a mission of the original ThreadLocal
still makes sense - it serves the function of associating a value with a particular stack frame. This is
still useful in an environment with one thread since, for example as in the new Kettle, multiple separated
stack frames, linked as successive callbacks to sequences of asynchronous I/O, can be morally packaged
together as part of the same "unit of work" - in this case, ultimately aimed at serving the same HTTP request.
In node.js, a native feature aimed at this same function is "Domains", described here -
http://nodejs.org/api/domain.html . It may be useful in future to harmonise our system with this
implementation, should it become the focus of useful infrastructure developed by others.
In the meantime, the framework's main use of ThreadLocal state can be seen in maintaining what from the
point of view of the IoC framework is termed the "dynamic environment". The dynamic environment can be seen
pictured in the following diagram from our docs: http://wiki.fluidproject.org/display/docs/Demand+Resolution
- it holds a number of IoC components which are expected to be "in scope for resolution", but only from the
point of view of the current stack frame.
This is awkward and from the point of view of architectural clarity it would be useful to do away with the
dynamic environment entirely, which I have been indeed trying to do for a number of years now. It seemed
that allowing the framework to take care of its own bookkeeping without using it would in theory pave the
way for all client frameworks such as Kettle to act similarly. Now that all [*] uses in the core framework
are gone, unfortunately it seems that there are some special reasons that suggest that Kettle needs to stay
the way it is, and that we can't in general withdraw the framework facility of the "dynamic environment".
"In theory", we could adopt the viewpoint that "anyone who needs to perform a request-related activity will
have access to the request component". This would mean that no special activity would be required when
propagating I/O callbacks - since at the end of the ultimate callback would be a direct object reference to
the original request component which could then do its work (service and close the request). This was the
kind of thinking that was vaguely behind the issuing of KETTLE-16 - the improvements in Infusion that it
described would then be sufficient to let the IoC system recontextualise itself upon reentering code
(invokers and listeners) attached to the request component.
For a few reasons unfortunately, this doesn't seem to be practical - the most urgent of which being our
approach to error handling. Some similarly rambling postings from 2011
http://lists.idrc.ocad.ca/pipermail/fluid-work/2011-June/007987.html and
http://lists.idrc.ocad.ca/pipermail/fluid-work/2011-May/007916.html
explain why exceptions - or in particular, try-catch blocks should be considered unusable in the JavaScript
language. A vital part of the infrastructure for stability of a Kettle application, then, is the following
"uncaughtException" handler issued directly at node:
https://github.com/fluid-project/kettle/blob/master/lib/utils.js#L29-L42
The short version of this discussion is - this handler is vital, there is no way to support it without
ThreadLocals or similar infrastructure - therefore, we can't get rid of them. The Fluid framework failure
handler below it is covered by just the same reasoning.
The slightly longer version of this discussion, for anyone who dreamed of wanting such a thing, involves
speculating about reasons why the server-side environment is so fundamentally different to the client-side
in this way - since it still seems perfectly possible to abolish ThreadLocals from all client-side code. The
reason I think relates to "user ergonomics" - the equivalent of the "HTTP request" on the server is simply a
GUI event on the client - and in fact, "the same user" is behind every client-side event. However, on the
server, the original request object is our only lifeline to the original user - should we lose our thread to
it, we lose any ability to signal errors or other status to them. By contrast, losing a DOM "event" object
on the client merely means losing some mostly irrelevant bookkeeping about the coordinates of the mouse
pointer at a particular time, and what else the user was doing then. Should we need to catch up with the
user later, he is behind 100% of our markup interface and will most likely be generating a further event in
a few more ms.
This vital nature of the request object also carries through to thinking about debugging - this kind of
issue led the node.js team, typically rather hard-nosed and not prone to take debuggability concerns very
seriously, to invent the idea of their "Domains" in the first place. Time and again we will most likely find
ourselves in some unfathomable callback nest in node.js, and without the ability to lay our hands on the
original (node/express/kettle) "request component" ultimately responsible for the current stack frame, we
will be quite in the dark about what the problem is about. Someday, no doubt, we will integrate this
facility into whatever debugging tools we start applying for use with the Kettle architecture. On which
issue, I'm happy to report that the previously apparently moribund "node-inspector" project appears to be
springing to life again, under some active new management from the Czech Republic:
https://github.com/node-inspector/node-inspector
Summary is at top, having learned from Colin's response from my 2011 mail :)
[*] This is not quite true - one last use remains as part of the "source tracking" system in the
ChangeApplier, described in JIRAs http://issues.fluidproject.org/browse/FLUID-4679 and
http://issues.fluidproject.org/browse/FLUID-4633 - this should be removed, which would actually resolve
FLUID-4679 which describes a case in which the current ThreadLocal system gives the wrong answer.
More information about the fluid-work
mailing list