Architectural Issues around Text to Speech

Justin Obara obara.justin at gmail.com
Fri Apr 19 10:39:05 EDT 2013


Yesterday afternoon we started talking and tasking out the work for implementing the text-to-speech feature in UI options. The notes from that meeting are up on the wiki.
http://wiki.fluidproject.org/display/fluid/UI+Options+Text+to+Speech+Tasking

In looking through the designs, there are several architectural issues that have arisen. In the current designs, text-to-speech will read out the contents of a predesignated section on the page, likely the contents of the <article>. The user will be able to play and pause the reading, but also be able to select a portion of the text via, keyboard or mouse, to start reading from. This raises two high level questions.

How do we make a selection?
How do we start reading from that selection?
How do we know when a selection was made?

How do we make a selection?
=======================

Mouse: 

This is straight forward, and should likely be supported on any system that supports a mouse.


Touch: 

This is also likely handled by any current OS that supports touch.


Keyboard:

We should be able to make use of the browsers built in caret navigation. Although this may require the user to enable it in the browsers settings. Safari and chrome (tested on mac os x) seem to behave the same, in that you have to first double click on a word before you can use the keyboard to modify the selection. However, this interaction from Safari and Chrome is not ideal, as the user would still have to use the mouse to start the selection.

http://hkitago.com/2009/03/safari-and-caret-browsing/
http://windows.microsoft.com/en-CA/windows7/select-text-and-move-around-a-webpage-with-your-keyboard


How do we start reading from a that selection?
====================================

This question was particularly nebulous. We would have to know what was selected, what DOM node that selection was from, and where in that DOM node the selection came from. 

Example 1:

<p> A fool thinks himself to be wise, but a <strong>wise man</strong> knows himself to be a fool.</p>

In Example 1, suppose we select "a <strong>wise". There are at least two potential issues 1) starting in the middle of the DOM node, and 2) crossing DOM nodes.


Example 2:

<p> Give every man thy ear, but few thy voice. <p>

In Example 2, suppose we selected "thy". Since the word "thy" is contained within the text for the node multiple times, how would we know which one was correct?


One possible would be to make use of window.getSelection(). This will provide us with a selection object that we can use to get the text selected as well as the node(s) that the selection starts and ends in. We should also be able to determine where in the DOM node the text  selection is, making it possible to distinguish between multiple occurrences of the same text.

https://developer.mozilla.org/en/docs/DOM/Selection
http://blogs.msdn.com/b/ie/archive/2010/05/11/dom-range.aspx

There is a question of browser support, particularly for IE 8 and below, but we might be able to find a polyfil to help with that.

http://www.quirksmode.org/dom/range_intro.html
http://code.google.com/p/rangy/


How do we know when a selection was made?
====================================

There doesn't seem to be any specific selection events that we could listen to. However we could probably use mouse presses, key presses and touch events to trigger a check of the selection object (see above).

http://stackoverflow.com/questions/2859985/event-on-html-selection


Thanks
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.idrc.ocad.ca/pipermail/fluid-work/attachments/20130419/c2ce5498/attachment.html>


More information about the fluid-work mailing list