My Collection update

Svetoslav Nedkov snedkov at
Wed Jan 20 14:53:41 UTC 2010

Hello Colin,

The last couple of days I'm cleaning up my code and fixing the issues 
you mentioned. I want to finish with the functionality for My Collection 
and take on the object code entry (we split work with Andrea) this week.

I've added some comments inline:

Colin Clark wrote:
> Sveto,
> Huge apologies for the delay in getting back to you with advice and 
> some code review. I've had a chance to take a look at your code, and I 
> think things are coming together nicely.
> I've included some comments and suggestions below. I'm wondering if 
> you'd also be willing to give us a code tour some time this week? That 
> will help me better understand your intentions with the code.
> On 7-Jan-10, at 12:12 PM, Svetoslav Nedkov wrote:
>> 1. The integration of the user collection with the artifact view is 
>> quite ready, I'm currently having an issue with a selected dom 
>> element that doesn't seem to accept click events when passed to 
>> another component, but I hope that I'll be able to fix that for a 
>> short time tomorrow that's why I won't fill in the details.
> Were you able to fix the issue? If not, tell us more. It sounds 
> interesting.
>> 2. To provide a better way of testing this, tomorrow I will create a 
>> script that generates empty shadow documents for the artifacts that 
>> are seen in the browse page. This way we will be able to add/remove 
>> all the artifacts that we currently see.
>> 3. Also another concern I have is regarding the data structure we 
>> use. Last talk on the subject we had we settled for a centralized 
>> user database, but I understand that this is not planned and intend 
>> to remove it completely, replacing it with a suitable CouchDB view 
>> that will be used only for getting data. This will eliminate the 
>> problem with redundancy I've mentioned in my previous email.
>> I'd like to hear your opinion on the subject.
> I'm afraid I've cause terrible confusion around the issue of shadow 
> databases and how collections should relate to artifacts. This was 
> undoubtedly inspired by some bad code I sketched out while talking 
> about the idea of shadows at the dev meeting back in November, and I'm 
> really sorry for the confusion. Let's see if I can try to clear this up.
> Justin is right, the point of shadow documents is to maintain two 
> different "namespaces" for writing data. The first contains data 
> sourced from the museum directly, in its original format. Everything 
> in the database from Engage 0.1 fits within this category, since it's 
> all read-only.
> The second document, or shadow, stores any contributions from users 
> that apply to a particular museum-sourced entity. So, for example, if 
> we wanted to add an array of user tags to each artifact, we'd write 
> them to the shadow database instead of modifying the museum-derived 
> document directly. That way we can clearly identify where data 
> originated so it can move freely move back upstream to the museum if 
> needed.
> In trying to illustrating this at the dev meeting awhile ago, I 
> incorrectly suggested that pointers to the collection should be 
> located within the artifact itself. That's not necessary, and it's 
> much simpler just to have collection documents refer to artifacts. You 
> had it right the first time.
> Circular references are, as you pointed out in your last email, 
> problematic. Having the artifact/collection "relationship" stored in 
> both documents is unnecessary and does raise the sorts of 
> transactional issues that a well-designed Couch database needn't 
> ordinarily be too concerned with.
> So, I'd suggest getting rid of any references to collections within 
> artifact documents. That way, you won't even need to maintain a shadow 
> artifact document at all, and you can simply write to the collection 
> document without concern for shadows or mapping from a museum schema. 
> Just write to your collection document and you're done; this should 
> simplify your code a fair bit.
> As for your specific question, I agree that we'll probably often have 
> views in Couch that will provide a merged, read-only view of an entity 
> containing data from both the main document and its shadow. We'll also 
> have some infrastructure in our data access layer on the server that 
> takes care of writing to the shadow. It's not something we've worked 
> out yet, but your suggestion of creating shadows on the fly when 
> they're not there sounds like a reasonable approach.
> The good news is that so far we don't really have a need for shadow 
> documents, so we can sidestep this complexity. I expect in the future 
> we'll probably have to tackle these issues, but for now we needn't 
> sweat it. Sorry for the confusion.
>> 4. I think that the idea to generate a CouchDB unique id for the user 
>> session is a good idea, just to clarify - will we create a document 
>> for the session that can be expanded in the future or for now just 
>> use the functionality that allows us to generate uuids.
> Not wanting to risk any ambiguity, I think we should treat these as 
> user IDs, rather than session IDs. They won't correspond to any formal 
> session state on the server-side (we don't have session state), and 
> they are really a way for us to keep track of a particular user. Once 
> the designers have resolved how logins will work, I assume that we'll 
> keep track of user login/password information via these ids as well. 
> So, inspired by how you've designed collection documents in the 
> "users" database, here's how I'm thinking we might represent it all:
> {
>   type: "user"
>   _id: <crazy-long-couch-uuid-here>
>   email: <not used at first, but perhaps eventually filled in by the 
> user>
>   collection: {
>     artifacts: [
>       {
>         museumId: "mmi",
>         id: <crazy-long-couch-artifact-id-here>
>     ]
>   }
> In effect, it's the same structure that you've laid out, except that 
> the document represents the whole user rather than just the 
> collection. Does this seem like a reasonable approach, or am I missing 
> anything obvious?
> So, onto some code review:
> * Standalone previewability: Sometimes it's really nice to test a 
> component without needing the server or database running. I couldn't 
> get the MyCollection component to run standalone due to some path 
> problems. I also didn't see any sample data, so you'll probably want 
> to implement that as well. Take a look at the other component or the 
> work Boyan has done with Capture for reference. It's a bit of extra 
> work, but really helpful.
I've added some test data and it is possible to open the collection page 
without a server. Hope this is enough (no need to simulate the 
collect/uncollect functionality in offline mode).

> * Minor path issue: when I checked out your code, you've got Infusion 
> in a directory called "infusion," but your paths refer to 
> "fluid-infusion." I renamed the directory and it worked fine. To 
> simplify things, I'd suggest just bringing in Infusion as an external. 
> We still need a better way for non-committers to work on release-level 
> code (branching is all we've got at the moment--wish we were using 
> Git), so it's something we'll try to talk about at the dev meeting 
> next week.
> * You mount your myCollection data feed and template inside the 
> "/artifacts" URL space. I'm thinking that since these documents may 
> actually represent users, we should mount them as a top level 
> resource. Here's a sketch for now, and then we can consider a more 
> resource-oriented (rather than view-oriented) approach later:
>    User data feed:
>    MyCollection template:
I changed the mount point to your suggestion. Will this be enough for 
the 0.3 release?
> * I'm not fully clear on what's happening in your render() method in 
> the MyCollection.js component. I'm confused about the block where you 
> call fetchTemplates() around lines 122-133. If you're calling 
> reRender(), you should already have the parsed templates and don't 
> need to fetch the raw HTML template again, right?
This was my mistake and my lack of comprehension of how the code works, 
now it is fixed.

> * Could some of the code in your component--such as getArtifactIds() 
> and the other get...() functions--be implemented as Couch views or 
> model mapping functions instead?
I need your further clarification on this point.
I think that if we use a view for retrieving the artifact ids this will 
make the overall loading of the collection page slower. I looked at the 
code and couldn't find a way to optimize it too much, beside creating an 
object out of the raw data that will contain the data that is currently 
collected at two passes from the raw data. The steps I take to assemble 
a collection for a specific user are:

- get the user record from couch that contains an array of artifacts 
that consist of museum and artifact id
- extract the artifact ids so to restore the original order at the end 
(the order of artifacts in the database is the order to display in the page)
- organize the artifacts by museum in an associative array
- compile couchdb query urls for each museum - those urls have a list of 
artifacts that we need to retrieve
- use the originally extracted artifact ids with the retrieved data to 
restore the original order

This sequence is complicated, some of the functions are missnamed, but I 
couldn't find a simpler way.

> I noticed that the code in your updateDatabase.js file could use some 
> work. Here are a few issues I noticed:
> * There's a fair bit of code duplication here. If you take a look at 
> your getCollection(), getCollectionById(), and getShadowArtifact() 
> functions, they share a fair bit of boilerplate code. It should get 
> simpler without shadow artifacts, but perhaps you can factor some of 
> this code out into a single, reusable function? collection() and 
> uncollect() also share a pattern. As an aside, this sort of data 
> access is now pretty common across all services, so Yura and I are 
> going to dig into some framework code to reduce this code redundancy 
> significantly.
I've merged collect() and uncollect() into one function, removed 
getCollectionById() and getShadowArtifact() as they were linked to 
shadow artifacts.
About data access I plan to switch to jquery.couch.js at some point, I 
see that there is still debug code in there, so when it is ready for our 
use I will.
> * I think we could be a bit more resource-oriented in our URL design 
> here. Generally, we want mounted handlers to represent a real thing in 
> the system--resources such as artifacts and collections--and then use 
> HTTP methods for operating on those resources. In particular, I wonder 
> if there's a way to implement your collection operations differently. 
> Here's a sketch off the top of my head, but it will need a bit work to 
> think through before implementing:
>      POST adds the artifact identified by the id "abc" to the "xyz" 
> user's personal collection
>      DELETE uncollects the artifact from the user's personal collection
> I realize there's an asymmetry between this more resource-oriented 
> style of URL and some of our existing conventions. I'd like to move 
> towards a more resource-oriented way over time, but I realize it make 
> take some new infrastructure in Kettle as well as a bit of design. 
> Another topic for the dev meeting.
This is a great idea. I was on my way of implementing it when I realized 
that I needed the museum beside the artifact to perform the 
collect/uncollect operations. So one obvious solution is to encode the 
museum in the url too, but this will break the mapping between the url 
and the couch document. The other is to add a parameter which is neither 
too great. So I'll give it another thought and try to come up with a 
solution. Of course suggestions are welcome.
I think that we also need to change a bit the acceptor logic so that it 
handles such URLs.

> Whew, super long email. Hopefully it's not too much to digest and that 
> it's helpful. Don't hesitate to keep up the thread if you have any 
> questions or if there are things I'm missing here. I'm really 
> interested in your ideas, suggestions, and alternative designs for any 
> of these issues, too!
> Colin
> ---
> Colin Clark
> Technical Lead, Fluid Project

More information about the fluid-work mailing list