Wednesday, October 22, 2008

Search Technology More Difficult Than Originally Thought

It's been a few weeks now, and the Google search widget that we deployed has yet to yield even a minimal return. It was exceptionally easy to deploy, but there are a few concerns that I'm itemizing below.

First, the search widget simply isn't generating results. All dynamically-generated .php pages have been specified in the Sitemap, but not all of them have been indexed. It's understandable that it will take some time for Google to index pages, but several weeks is too long to wait. We're not sure why it's taking so long. There's interesting and timely information that we're generating that no one will ever find through Google Search let alone the search widget deployed on the site if the content isn't indexed. (This is actually more of a comment on how quickly Google factors and incorporates available sitemaps into its index.)

Second (and this is an extension of the above point but specifically on the Google search widget deployed on the site), the search widget isn't even reflecting back the Sitemap that Google can freely access. It's one thing for the site to be indexed as part of Google Search, but we're talking something even more simple here . . Just make the pages in the Sitemap available to the search widget. Why does the search widget need to reference back to Google's entire index ? Why can't it simply make use of the site's own sitemap for the Google search widget ? It would seem they have everything they need . . a sitemap and a search widget. That's basically what Google Search is for the entire internet.

Third, there are generally a boatload of ads presented in the search results. This isn't to bicker with the Google search widget being FREE, which is great, but there's typically too many. It varies so, it's impossible to offer a clear cut example. But, nonetheless, they take up too much space, clutter the search result page, and even make it hard to see the search result, which is the whole point of deploying Google's search widget in the first place.

More comments suggestions to come.

Tuesday, October 14, 2008

Early Results . . and Traction

We've actually been getting good response already so, we've been investing a bit more effort into the mechanics of the site – That’s why you haven’t seen more posts here on the blog.

But the good news is that you’ll soon find emailorspam.com easier to find and use, you’ll have access to even more information than originally anticipated, and we’ll be able to do all this for less overhead. That means we’ll be able to spend more time generating more useful data (as opposed to just spending time ‘making the wheels turn’). Think of this as a bit of a Twitter to keep posts updated.

Wednesday, October 8, 2008

Stopped Using Yahoo! Personal Site Search

This is just a quick post to share that we've stopped using the Yahoo! Personal Site Search. We may give it another shot at some point in the future, but for whatever reason, it simply insisted on scanning the raw .php, skipping the process of actually rendering the dynamically-generated .php content based on the Sitemap and, thereby, 'missing' the actual content that makes the site worth visiting. Not sure why they elected to do it this way, but it's suboptimal to say the least -- After all, it's the content that makes any site worthwhile.

Why is this worth mentioning ? Well, if you're hosting with Yahoo!, publishing dynamic content, and interested in deploying Yahoo! Personal Site Search, then you'll perhaps want to read this and weigh your options and the potential benefits. It didn't work for our .php pages -- Will it work for you ? Also, it's funny how the internet works. I tried contacting Yahoo! with the above question and never received a response . . Maybe this will catch their attention. I'm not saying I'm unhappy with Yahoo! -- I'm just saying I deserved a reply to my e-mail to their customer service.

Friday, October 3, 2008

Fidgeting with Search Technology

Perhaps the only surprising thing thus far has been how much time (and how long) it takes to both integrate search technology and then see it actually generate something useful. It’s really interesting, and here’s what I've learned so far.

Yahoo! offers an excellent search widget (officially called Personal Site Search). The problem is, however, that it appears their search widget doesn't talk to their retrieval and indexing of sitemaps. (FYI, their search widget is basically a little search window that allows users to search your site’s content . . VERY useful and a huge savings on what it would take to build this yourself.) The problem is that, because the two don’t seem to talk, early indications are that the search results never display dynamically-generated content for legitimate .php (and other dynamic content) pages that are only specified in the Sitemap. I am unsure why that would be, but I found other similar comments posted on the web. I also never received a reply from Yahoo! on this question, though I will defend them on the general excellence of their On-Line Help.

An even more technical perspective ? It seems like Yahoo! efforts to index sitemaps and how they've deployed their search widget (Personal Site Search) simply don’t talk to each other. Furthermore, early results indicate that Yahoo! only scans pages for content, not render pages first and then scan. Why would this make a difference ? Well, if you have dynamically-generated pages (.php or others) and you're hosting with Yahoo!, it seems as if Yahoo! just looks at the ‘template’ and skips past rendering the actual (dynamic) content that actually makes the page useful and, more importantly, valuable.

So, I’m still experimenting, but I’m also exploring the Google front . . .

Google also offers a similar service. For free (ad driven) or $100+ (no ads) you can get a search widget for your own site. Now, it works intuitively and, in my opinion, the ways something like this should . . Basically, you build pages, alert Google (and other search engines) of those pages via a Sitemap, and then wait for Google to index them. The two concerns at the moment are that 1) it obviously takes time for this to ripple through Google’s constant efforts to index the internet and 2) it is yet unclear whether ALL pages specified in the Sitemap will actually appear in search results.

Why is the second point a big deal ? Well, assuming you built a page with useful content and are relying on search for people to be able to find it (not everything can have a link on the homepage), then it’s impossible that anyone will ever find the content if it’s not actually indexed . . In other words, Google not actually indexing every page in the Sitemap means those pages may not be indexed and may never ever be seen by those who might actually be looking for that exact content. Why is this worth delving into ? Well, I can perfectly understand that Google may not rate a particular page worthy of indexing relative to all that’s available on the internet (that’s pretty stiff competition). But don’t confuse that with making an entire site’s content available to an actual visitor – If a visitor is visiting your site, it obviously behooves you to make available ALL your site's content. The concern is that, while easy to deploy, Google's search widget does not appear to generate complete search results (as indicated by the Sitemap), which essentially 'hides' the very content visitors may be looking for.

More testing is underway, but we’re hoping one of these two approaches works out and, more importantly, helps people find content as organically and smoothly as possible.