Friday, October 3, 2008

Fidgeting with Search Technology

Perhaps the only surprising thing thus far has been how much time (and how long) it takes to both integrate search technology and then see it actually generate something useful. It’s really interesting, and here’s what I've learned so far.

Yahoo! offers an excellent search widget (officially called Personal Site Search). The problem is, however, that it appears their search widget doesn't talk to their retrieval and indexing of sitemaps. (FYI, their search widget is basically a little search window that allows users to search your site’s content . . VERY useful and a huge savings on what it would take to build this yourself.) The problem is that, because the two don’t seem to talk, early indications are that the search results never display dynamically-generated content for legitimate .php (and other dynamic content) pages that are only specified in the Sitemap. I am unsure why that would be, but I found other similar comments posted on the web. I also never received a reply from Yahoo! on this question, though I will defend them on the general excellence of their On-Line Help.

An even more technical perspective ? It seems like Yahoo! efforts to index sitemaps and how they've deployed their search widget (Personal Site Search) simply don’t talk to each other. Furthermore, early results indicate that Yahoo! only scans pages for content, not render pages first and then scan. Why would this make a difference ? Well, if you have dynamically-generated pages (.php or others) and you're hosting with Yahoo!, it seems as if Yahoo! just looks at the ‘template’ and skips past rendering the actual (dynamic) content that actually makes the page useful and, more importantly, valuable.

So, I’m still experimenting, but I’m also exploring the Google front . . .

Google also offers a similar service. For free (ad driven) or $100+ (no ads) you can get a search widget for your own site. Now, it works intuitively and, in my opinion, the ways something like this should . . Basically, you build pages, alert Google (and other search engines) of those pages via a Sitemap, and then wait for Google to index them. The two concerns at the moment are that 1) it obviously takes time for this to ripple through Google’s constant efforts to index the internet and 2) it is yet unclear whether ALL pages specified in the Sitemap will actually appear in search results.

Why is the second point a big deal ? Well, assuming you built a page with useful content and are relying on search for people to be able to find it (not everything can have a link on the homepage), then it’s impossible that anyone will ever find the content if it’s not actually indexed . . In other words, Google not actually indexing every page in the Sitemap means those pages may not be indexed and may never ever be seen by those who might actually be looking for that exact content. Why is this worth delving into ? Well, I can perfectly understand that Google may not rate a particular page worthy of indexing relative to all that’s available on the internet (that’s pretty stiff competition). But don’t confuse that with making an entire site’s content available to an actual visitor – If a visitor is visiting your site, it obviously behooves you to make available ALL your site's content. The concern is that, while easy to deploy, Google's search widget does not appear to generate complete search results (as indicated by the Sitemap), which essentially 'hides' the very content visitors may be looking for.

More testing is underway, but we’re hoping one of these two approaches works out and, more importantly, helps people find content as organically and smoothly as possible.

No comments: