I must say that I would also welcome some documentation on the decisions that were made, as viewing the actual data has left me with a number of questions. I'm going to begin my comments with a question about scope, and some confusion that is causing me as I think about how I would want to use this data.
What's an LC Subject Heading?
It appears that the LCSH file that is online represents those authority records whose LC control number begin with "sh", as in: sh 00009880. (Numbering 342,684 records.) However, if you do a Subject Authority Headings search in the LC authorities database you will retrieve any authority record that can be used as a subject. This means that you will retrieve personal names, corporate names, and geographic entities that can be used as subjects. (Note, this is probably a large portion of the name authority file.) This is a mixture of records with LCCNs that begin "n" (for name file) and those that begin "sh" (for subject heading file). I'm at a loss to explain/understand what determines whether a heading has an LCCN beginning with "sh" and would love to get an explanation.
The result is that a search in the LCSH file on the word "Italy" brings up 3,516 headings, with the word somewhere in the heading. However, the heading "Italy" alone is not included. You do have:
Italy, Centraland you have:
Italy, Northern--CivilizationBut not "Italy."
Italy, Northern--Civilization--Germanic influences
A search in the name heading database on LC's online authority file yields a name heading entry for "Italy." That database (whose response is in the form of a browse list) has innumerable pages for corporate names under the initial term "Italy":
Italy.It also includes "Italy, Southern" with its LC control number "sh 85069035".
Italy. Ambasciata (India)
Italy. Confederazione fascista degli industriali.
The upshot is that the LC Subject heading file at http://id.loc.gov is not the same as a subject heading search in the online authorities database. It also isn't always logical which file headings fall into. The "Italy. Ambasciata (India)" is in the name heading file as a corporate name, but "Palazzo Dell'Ambasciata di Spagna (Rome, Italy)" is in the subject heading file as a corporate name. There undoubtedly is a set of rules that explains all of this, but it seems to me that a separation of the subject file and the name files creates a split between headings that will not be mirrored in actual use.
This may not matter if the files are combined in the end, and the URI makes it look like all authorities will have ids that directly follow "/authorities/" in the URI. However, although they are both coded as corporate names, the "Palazzo... " record gets the "cool URI" http://id.loc.gov/authorities/sh2002000509#concept. Note the ending in "concept". I don't know what hash ending will be given to entries from the names file, but I do find it odd that corporate names ccould have two different hash endings, depending on which file they are from. To be frank, especially since the division into different files doesn't seem terribly logical, and that many items in the name file can also be used as concepts, I would prefer that the "#" indicate the type of heading (personal name, corporate name, conference, geographical name, topic) rather than the file that it comes from. That is, that the "#" would reflect the MARC tag - 100, 110, 111, 150, 151.