Gregor Heinrich has a good paper on Latent Dirichlet Allocation, which I believe is an extension of Latent Semantic Analysis. It is a model which can be used to group documents based on semantic content. He gives the mathematical details and the "punchlines" for implementation. The model takes as input a collection of documents and outputs a topic label for each word in each document. The documents can be plotted in K-dimensional space, where K is the number of possible topics, by using the proportion of each topic in a document as its coordinates. Documents which are closer to each other have topics more similar to each other. You could then use your favorite clustering algorithm or a simple distance threshold to decide which should documents should link to each other.
http://en.wikipedia.org/wiki/Latent_semantic_analysis
I found examples in ruby and python in this blog, which is unfortunately down just at the moment http://blog.josephwilk.net/ruby/latent-semantic-analysis-in-...