View Full Version : Search problems
Xotli
January 16th, 2007, 08:45 PM
Okay, so something is definitely wrong with search. Try a few simple ones yourself: a search for "height", for instance, yields only two pages ... a search for "search" yields only one(!). Obviously that's not right, on either count. Also note that apparently nothing older than the past few months shows up.
I suspect that something is up with the indexes. Perhaps they just need to be rebuilt? IAE, at the very least we should all be aware of this issue. No point in telling the noobs they should be using the search to find old threads when it ain't working.
As always, if there's anything I can do to help, let me know, but it sounds like Ketch has things under control lately.
Cavalier
January 17th, 2007, 10:18 AM
Okay, so something is definitely wrong with search. Try a few simple ones yourself: a search for "height", for instance, yields only two pages ... a search for "search" yields only one(!). Obviously that's not right, on either count. Also note that apparently nothing older than the past few months shows up.
I suspect that something is up with the indexes. Perhaps they just need to be rebuilt? IAE, at the very least we should all be aware of this issue. No point in telling the noobs they should be using the search to find old threads when it ain't working.
As always, if there's anything I can do to help, let me know, but it sounds like Ketch has things under control lately.Yes this has really put a damper on me being able to say "search is your friend", because right now it isn't :cry:
From what I have been able to determine, the searches are only yielding results since the DB rebuild.
Xotli
January 17th, 2007, 11:18 PM
From what I have been able to determine, the searches are only yielding results since the DB rebuild.
Yeah, that's my suspicion as well. Hopefully a simple rebuild of the indices will fix it.
Guys? any chance someone can do that? Again, if there's anything I can do to help facilitate, just let me know and I'll be happy to pitch in.
Ketch
January 18th, 2007, 09:27 AM
You are mostly correct...
DB wasn't rebuilt, if it was then there would be no posts period.
However the search word list which I said takes up a lot of space was killed on the 12th.
Any posts before the 12th will not be searched, so unfortunately, all you people who defer newbies to searches will have to stop :P
Cavalier
January 18th, 2007, 09:54 AM
You are mostly correct...
DB wasn't rebuilt, if it was then there would be no posts period.
However the search word list which I said takes up a lot of space was killed on the 12th.
Any posts before the 12th will not be searched, so unfortunately, all you people who defer newbies to searches will have to stop :P
NNNNNNNNNNNNNNNNOOOOOOOOOOOOOOOOOOOOOOOO! This is BAD, very, very, bad! :cry:
I understand it from a size perspective, but it cuts the ability to effectively use the site by an order of degrees. Being unable to search old posts means that valuable information is essentially lost. :brickwall:
just my 2¢
Ketch
January 18th, 2007, 10:45 AM
I agree,
I hope I didn't convince them to do it. I made a comment before
Yes that affects DB size extrodinarily.
A big thing phpbb does though is categorizes every single word every used in a post into the DB for searching. Besides posts that is the next largest table usually.
I was answering someone's question, but I wasn't advising it haha.
Guess I should make that distinction.
Also, there is no way of getting it back (unless there is a backup, however, even that would be hard to do, because it would need ot be merged with the past weeks stuff)
Kepler
January 18th, 2007, 11:07 AM
If we post to old threads would that make the rest of that thread searchable or would just the new posts in the thread be searchable?
Uprising
January 18th, 2007, 11:18 AM
People will have to grind it out the old fashioned way. Thread by thread, page by page.
Ketch
January 18th, 2007, 11:40 AM
the other thing is...
normally you could google search a site... using the
"site: www.heroscapers.com search words"
thing.
But we banned google from the site... haha
InfinityMax
January 18th, 2007, 11:53 AM
HAHA! Man, that's priceless! Google is banned!
I totally understand why, though. Their bots were killing the bandwidth.
It does help a little to know what happened, though. I was wondering why a thread I knew dang well was still around would not pull up in Search.
One thing you can try, if you have the patience and know who posted what you're looking for, is to hit that person's profile and check 'view all posts by this person.' That's how I found the thread I wanted - I checked my own profile and looked through my 103 pages of post summaries.
Cavalier
January 18th, 2007, 12:41 PM
HAHA! Man, that's priceless! Google is banned!
I totally understand why, though. Their bots were killing the bandwidth.
It does help a little to know what happened, though. I was wondering why a thread I knew dang well was still around would not pull up in Search.
One thing you can try, if you have the patience and know who posted what you're looking for, is to hit that person's profile and check 'view all posts by this person.' That's how I found the thread I wanted - I checked my own profile and looked through my 103 pages of post summaries.I have in fact used that technique this week...almost as painful as slicing your wrists and doing push ups is saltwater...but not as bad as trying to find something manually.
Xotli
January 18th, 2007, 01:57 PM
I understand it from a size perspective, but it cuts the ability to effectively use the site by an order of degrees. Being unable to search old posts means that valuable information is essentially lost. :brickwall:
I think you understate the issue, personally. Not being able to search the site basically means the site is useless to quite a few people. Sure, we can still post, but I think that, just like any online community, there are always a good deal more reading than posting, and most of those readers are here looking for information.
That they, now, can't find.
Which makes the site useless, at least for them.
Also, I can't agree that there's no way of getting it back. If the posts still exist, then the search indices can be recreated (and if the posts don't still exist, then I suppose the entire point is moot, but IMax's comment seems to indicate that happily that's not the case). I mean, it may not be easy perhaps, but it must be possible. Again, I fully volunteer my time and effort to help out if necessary. I just hate to see the site crippled in this way.
Plus I have some stuff I wanna search for. :D
Ketch
January 18th, 2007, 02:34 PM
To recreate the search indices would mean
1. first finding all 2 million + words ever typed in these forums.
2. searching every post ever created (currently 180 000+) and putting the post number into the index under that word.
To do this by programming something to parse the posts and put them into the table again would take 100% of the websites resources for a couple of days with a O(n) algorithm.
You don't realize how much time it takes to make that table, but over the course of a year the site slowly built the search index.
To do it manually would take in the order of years.
Xotli
January 18th, 2007, 03:43 PM
To recreate the search indices would mean
1. first finding all 2 million + words ever typed in these forums.
2. searching every post ever created (currently 180 000+) and putting the post number into the index under that word.
To do this by programming something to parse the posts and put them into the table again would take 100% of the websites resources for a couple of days with a O(n) algorithm.
You don't realize how much time it takes to make that table, but over the course of a year the site slowly built the search index.
To do it manually would take in the order of years.
Well, as it happens, I regularly deal with databases with millions of records, so I do have a bit of experience with these sorts of problems in general, tho not of course with phpBB in specific. If I were going to fix the problem, I would approach it in the following way:
a) Take a snapshot of the DB as is.
b) Regenerate the search records for all posts prior to the earliest post currently search-indexed. This would be done on a separate machine, to avoid hurting the site's performance.
c) Reintegrate the new records with the old.
So the site is only down during a) for sure, and maybe for c), depending on how phpBB's search tables work. And, technically, since old posts never change and since, for what needs to be done in b) there, we don't really give a crap about whether the DB is in a fully consistent state, you don't even actually have to take the site down for a). Technically. Although dumping all the tables would certainly slow the site down quite a bit.
Of course, if the site has regular backups (which hopefully it does), we could just use one of those for a). Anything remotely recent should be good; it would have to be pretty darned old to be too old to be useful at this point.
c) would probably be the only really bitchy part. But, again, it depends on how phpBB stores its search index tables. If the records are per word/post, then reintegrating them is no problem at all, since you're just inserting a whole new batch of records into the table. If the records are only per word, then existing records would need to be modified to include the older post info, and that would definitely get a bit hairy. Or if there's some other wacky possible way of storing the records that I haven't thought of, then we'd have to take it from there.
Again, I don't want anyone to think I'm just mindlessly bitching over here: I have the resources to help with this project if necessary. I don't really have the time, but I'd make the time because I think this is pretty darned important. I don't have _all_ the knowledge necessary, but I do have over 10 years' experience in general data munging, most of it on large datasets, and I think I could work out the phpBB-specific bits of it without too much effort.
Just throwing out ideas.
Kepler
January 18th, 2007, 03:53 PM
The site isn't useless for those wanting information. They can just re-ask all of those old questions and we can re-answer them. It will be fun. :D
Ketch
January 18th, 2007, 04:43 PM
Well, reading recent posts, it does seem to have made an impact, people are already asking questions that were obviously available before ie. suggested tournament rules, etc.
Good idea about the snapshot.
You'd only need the one table too (post text).
You forgot part "before-a") Ask Truth if they even want it back.
They wouldn't truck it without any consideration, they may want to change their minds,
I haven't missed it yet, you might want to ask them.
I can write the algorithm/code for you, and you can do the rest if they want it done.
Xotli
January 18th, 2007, 09:28 PM
Well, reading recent posts, it does seem to have made an impact, people are already asking questions that were obviously available before ie. suggested tournament rules, etc.
Yeah, I think losing the ability to search the old posts is a little like losing the site's "memory". Definitely something we want to get back, I'd think.
Good idea about the snapshot.
You'd only need the one table too (post text).
Well, it might be nice to have a few other tables as well ... for instance, it'd be good to have the search tables, just so I can see how they work.
You forgot part "before-a") Ask Truth if they even want it back.
They wouldn't truck it without any consideration, they may want to change their minds,
I haven't missed it yet, you might want to ask them.
Yeah, exchanged a couple PM's with truth. Another thing I was thinking is that, if the search tables are just too darn big, we could always increase the list of stopwords. That might cut down indexing on largely unnecessary words.
I can write the algorithm/code for you, and you can do the rest if they want it done.
Really? Well, yeah, that'd be very helpful.
PM coming your way.
Cavalier
January 18th, 2007, 11:59 PM
:D YEAH!!
You guys will be my heros...again...If you can pull this off 8)
Xotli
January 19th, 2007, 12:14 AM
:D YEAH!!
You guys will be my heros...again...If you can pull this off 8)
Well, no promises, certainly, but we're at least investigating possibilities.
Xotli
January 26th, 2007, 01:35 AM
Sorry for the delay guys; I've had a sick kid here for the past couple of days. But just wanted to let everyone know that I haven't given up; it's just taking a bit longer than I'd hoped.
Xotli
March 5th, 2007, 10:14 AM
... it's just taking a bit longer than I'd hoped.
Well, that turned out to be an understatement, eh?
So the deal is that, after several weeks of passing some nasty mutating virus amongst the four of us, then a couple more weeks of trying to get everything done that didn't get done while anywhere from one to three of us were sick at any given time, I'm finally back to working on this issue. So, don't give up hope: I'm still on the case, I'm just really really slow. :)
As of now, here's a quick status update. I have downloaded all the posts from the fora(*) and put them into a blank copy of phpBB. I then started running a great little extension I found which, handily enough, rebuilds the search index tables. So far, it's been running 15 hours. It's about 67% done. Hopefully this lets everyone know why we couldn't just do this on the site directly.
My next step(s) will be to figure out how to merge the search bits that aren't on the site with the ones that currently are. I don't think it will be that hard, but we'll see. It will probably be at least another week before this gets done, and possibly two (the kids' mother is out of town all weekend coming up so that my put a crimp in my plans). But I'll try to keep everyone here in the loop.
(*) Technically, all the public posts. In case anyone was worried.
Revdyer
March 5th, 2007, 10:24 AM
Xotli, no matter how long this takes, know that you have the deep appreciation of many people here, especially those of us who actually enjoy using the search function. Thank you!
Aldin
March 5th, 2007, 12:26 PM
Whoa! :shock:
Xotli - you are indeed loved and appreciated :up:
~Aldin, compiling
LilNewbie
March 5th, 2007, 02:36 PM
Xotli=Search and Rescue Expert. Thanks for your efforts, Xotli! Glad to hear you and your family are doing better.
Newb.
Xotli
March 6th, 2007, 10:15 PM
Hey, thanx for all the support, guys. That really does mean a lot (especially when I'm mainly feeling guilty for taking so long).
Had a brief scare today--my video card went all wonky on me, and I couldn't see anything. Had to reboot with no way of knowing if the rebuild was done or not. And then I had a work emergency and couldn't get back to checking on it all day. But just now I went to back to the thing and it automatically just picked up where it left off and now it's chugging along like it weren't nuthin'.
So far it's been rebuilding for about 40 hours, and it thinks it has another hour to go. But I don't trust that, because apparently it gets slower and slower as time goes on ... the first batch of 50 took under 10 seconds, but these latest batches are over 3 minutes a piece. So we'll see ...
ninthdoc
March 7th, 2007, 12:11 AM
Xotli, thanks for continuing to work on this. I'll keep my fingers crossed.
Ketch
March 7th, 2007, 01:31 AM
Hey, thanx for all the support, guys. That really does mean a lot (especially when I'm mainly feeling guilty for taking so long).
Had a brief scare today--my video card went all wonky on me, and I couldn't see anything. Had to reboot with no way of knowing if the rebuild was done or not. And then I had a work emergency and couldn't get back to checking on it all day. But just now I went to back to the thing and it automatically just picked up where it left off and now it's chugging along like it weren't nuthin'.
So far it's been rebuilding for about 40 hours, and it thinks it has another hour to go. But I don't trust that, because apparently it gets slower and slower as time goes on ... the first batch of 50 took under 10 seconds, but these latest batches are over 3 minutes a piece. So we'll see ...
GOOOD I am so glad we started doing this and that you kept going man. I was about to go do it myself since i thought you disapeared. I am dying without the search.
Xotli
March 13th, 2007, 10:17 AM
Final build time, for those who were interested, was 41.5 hrs, roughly. Thatsa lotsa posts!
As expected, I didn't have any time this past weekend ... constantly running around after the 1yo and the 8.5yo keeps one rather busy. But I hope to grab some time this week, or at the latest this weekend. Basically most of the merging will be trivial; my only worry is that I might have some overlapping word_id's, but even that shouldn't be awful.
I will probably, once I've guaranteed that there will be no collisions, start uploading the search data in small batches, to minimize impact on the site (that's how I d/l'ed all the posts, for those who are interested in the process). So searches might magically start returning more and more data as time goes on. Just FYI.
You forgot part "before-a") Ask Truth if they even want it back. ... I haven't missed it yet, ...
I am dying without the search.
Heh ... yeah, I thought you mind change your mind on that one. It's frustrating not being able to refer back to those old thread that you just know are out there ...
I'm definitely working on it guys. Let's not uncross our fingers just yet, but perhaps by sometime next week ...
Revdyer
March 13th, 2007, 01:06 PM
Continued prayers and blessings upon you, Xotli.
Marduk
March 14th, 2007, 12:17 AM
Yes, as good works for the site go, this has to rank high. If nothing else, maybe it will keep some old questions from coming up again and again. And as long as I am posting, let me say 'thank you'.
Ketch
March 14th, 2007, 10:44 AM
Final build time, for those who were interested, was 41.5 hrs, roughly. Thatsa lotsa posts!
As expected, I didn't have any time this past weekend ... constantly running around after the 1yo and the 8.5yo keeps one rather busy. But I hope to grab some time this week, or at the latest this weekend. Basically most of the merging will be trivial; my only worry is that I might have some overlapping word_id's, but even that shouldn't be awful.
I will probably, once I've guaranteed that there will be no collisions, start uploading the search data in small batches, to minimize impact on the site (that's how I d/l'ed all the posts, for those who are interested in the process). So searches might magically start returning more and more data as time goes on. Just FYI.
You forgot part "before-a") Ask Truth if they even want it back. ... I haven't missed it yet, ...
I am dying without the search.
Heh ... yeah, I thought you mind change your mind on that one. It's frustrating not being able to refer back to those old thread that you just know are out there ...
I'm definitely working on it guys. Let's not uncross our fingers just yet, but perhaps by sometime next week ...
Indeed, I am quite aware of my mind changing! haha.
I wasn't against it before, I just hadn't used it yet, but when you suddenly think of a topic you know was posted months ago, and is very useful, and then see 75 pages of posts to search through...
GaryLASQ
March 14th, 2007, 02:06 PM
it would be great if someone would get the search (keyword) table reloaded. i'll add $10 to my annual donation if you can make it happen :thumbsup:
Xotli
March 16th, 2007, 10:03 AM
Hey, I just noticed that Lil Newbie gave me a title (I'm a bit slow sometimes :wink:) ... cool!
it would be great if someone would get the search (keyword) table reloaded. i'll add $10 to my annual donation if you can make it happen :thumbsup:
We're making it happen for ya Gary. Hopefully we'll see some tangible progress soon.
I'm feeling real good about this weekend, guys. My class was cancelled today, so I can go into work early (well, earlier) so maybe I can get out earlier, the kids' mom is back in full effect, so I have tonight off(*), I got yet another project I was working on up to a respectable plateau so I can take a break from that, nobody's sick (knocking on wood) ... this could be it! I'm not making any promises, 'cause that just gets me into trouble, but I'm feeling good! Woohoo!
(*) For the curious, we alternate child responsibility throughout the weekend. I have Fri night and Sun day, she has Sat day and night. So that way we can each get some stuff done that we want to do on the weekend.
Xotli
March 18th, 2007, 03:54 PM
Just a quick note that I'm working on the search stuff today, so if your searches blow chunks (speed-wise), blame me.
Of course, searches were already blowing chunks (content-wise), so p'raps no one will notice. ;)
Xotli
March 19th, 2007, 12:29 AM
Okay, the majority of the work to restore the search capabilities is done. To understand the bits that aren't done, I'm going to give you a little boring detail on the way that phpBB does searching.
Basically there are two tables. The first is a list of words (called, appropriately enough, search_wordlist), which is just two columns: a word, and a number associated with that word (the word_id). The second is a correspondence (what we DB-types call a "join table") between words and posts; this one contains 3 columns: post_id, word_id, and a yes/no flag indicating whether the word is actually in the title of the post (it's called search_wordmatch, for the curiously inclined).
So, theoretically, every single word that anyone types in any forum, ever, gets put into wordlist and a reference that that word was used in that post gets put into wordmatch. Now, that's obviously not entirely realistic, because you're going to have certain words like "the" that are going to end up in so many posts that it's just ridiculous. phpBB handles that in two ways: first, it uses a list of "stopwords" (i.e. words which never get put into wordlist at all), and secondly it keeps track of words which become overly common--those are put into wordlist initially, then removed later once it's determined that there are too many instances of them. The first technique is for English language words that are always common (like "the"); the second is for words that might be common in a particular community, but not in general (for us, I'm sure "heroscape" would be an example).
Following me so far?
Okay, so, when I reindexed all the posts, I ended up with 3 classes of changes:
1) Some words were removed from the wordlist entirely, because while they weren't common enough in the smaller set we had before, once you considered all the posts, they were too common. There were only about 1,186 of those.
2) Some words were added to the wordlist, because they appear in the older posts, but not in the newer ones. There were 84,229 of those, resulting in 194,689 additions to wordmatch.
3) Some words were already in the wordlist, so they were unchanged from old to new, but they added entirely new wordmatch records. There were 42,719 of these words, but they were responsible for 4,164,449 additions to wordmatch. Which makes sense if you think about it, because there was a huge backlog of posts which weren't indexed at all, but since we're always talking about Heroscape, we tend to use the same words over and over again.
Now that you see what we're talking about, let me tell you what I did.
#1 was easy. I just uploaded a file containing the words that had been removed, then deleted them from the current wordlist and wordmatch tables.
#3 was conceptually easy--all I needed to do was upload a file containing the new wordmatch rows (the wordlist table would be unchanged) and then move those rows over into the current wordmatch table--but of course it took a very long time to accomplish. 4.2 millions rows is a hell of a lot of rows, even though it's just two numbers and a yes/no flag. The file was 260Mb (16Mb compressed), and the web site would accept the upload in approximately 20Mb batches. Each batch took about 11.5 minutes. You can do the math to see how long I spent just plain uploading. Then I had to get them into the main table, which I did in 10 batches, to make sure that I wasn't locking up the search table for too long at a time. And before I could generate the list of new wordmatch rows, and then again between uploading and inserting, I had to run a battery of checks to make sure that I was definitely only putting in the stuff that I should be, that all my numbers matched, that I wasn't creating any rows in one table that wouldn't exist in the other table, and that the entire situation hadn't changed while I was working (remember, the site was moving merrily along while I worked, so those tables were in a constant state of flux).
Now, I decided to go ahead and do #3 before #2, because a) it was conceptually easy, and b) it would give the most bang for the buck (4.2 million beats 200k any day). So I did that. And, just recently, I finished. So woohoo.
#2 is tricky. See, the problem is overlapping word_id's. My static copy of the website of course had no incoming data, so the new word_id's just got created by taking the maximum existing word_id and adding 1 (84 thousand times). Meanwhile, the real web site did have incoming data, which was of course getting the very same word_id's assigned to it, only for completely different words. Now, I was hoping that when the old search data had gotten cleared out, that a "hole" had been left behind (or several holes; I would have settled for that), and I could just move my new word_id's down into the hole(s), rearranged the wordmatch rows to match the new word_id's, and called it a day. No such luck. This means that I have to .... well, to be honest, I'm not even entirely sure what I have to do. I could put all the word_id's up to a super high number, but eventually the new incoming id's will hit that number too, and then we'd have a problem. So I'm still working on it.
But the main thing is that the search is much better now ... for certain definitions of better. Most of the searches I tested were returning results in the general neighborhood of about 1,000. That's certainly better than it was before in terms of quantity, but of course in the search game quantity is not always the most important thing. Something I never actually noticed before was that, with this whole "every separate word gets its own id" type of deal, searching for a particular phrase is practically impossible. I strongly advise you (and anyone else you offer advice on using search) to make sure you select the "Search for all terms" radio button when searching for more than one word. Otherwise the sheer number of results can make your search useless. Even so, when you search for "heavy gruts", you probably want to find threads which contain "heavy gruts", and not necessarily threads which happen to contain the word "heavy" and the word "gruts". Tough luck.
(You know, on an only tangentially related side rant, it continues to amaze me that while technology leaps forward in many areas, in some areas it just plain leaps backwards and gets stuck there. Compared to the threading and searching we had back in the day on Usenet--hell, even on CompuServe--phpBB is like a BBS run out of some kid's basement on a 1200 baud modem. But it's the industry standard these days. Go figure.)
So I think that you guys will see some major differences now. I'll ponder how to best approach handling #2, but it may take a while. Honestly, tho, it's a fairly small percentage of the total rows--search_wordmatch now contains about 5.5 million rows (yes, what I added is about 4/5 of the total), and this would tack on a mere extra 200k. Piffle, I say. So I might decide that in the end we can just live without them altogether. But then again I might not. I'm going to think about it.
But in the meantime I think things will be cooler, overall. Although I think at some point in the future, we're going to outgrow phpBB's pitiful search capabilities. Maybe, if we're lucky, we will by that time have outgrown phpBB altogether. Keeping my fingers crossed.
Questions, complaints, further clarifications?
GaryLASQ
March 19th, 2007, 12:48 AM
awesome Xolti. thanks a ton. :thumbsup:
i was actually able to search my way back to my very first post.
guess now i'll have to put my money where my mouth is. :)
Revdyer
March 19th, 2007, 07:08 AM
Many, many, thanks, Xotli.
And remember, searchers, AND is your friend! :)
truth
March 19th, 2007, 08:34 AM
Your our hero!
Aldin
March 19th, 2007, 12:05 PM
Ave Xotli!
~Aldin, roamin'
Cavalier
March 19th, 2007, 12:06 PM
Your our hero!
Amen!
yagyuninja
March 19th, 2007, 12:41 PM
Woohoo, I love search! Thanks Xotli!
Xotli
March 20th, 2007, 10:30 AM
Hey, thanks for all the props guys, especially from truth.
guess now i'll have to put my money where my mouth is. :)
Heh. Actually, that's pretty cool ... since I'm a lameass and haven't gotten my money together to make a donation to the site, at least this way I can feel like I'm contributing, albeit indirectly. (The new kid threw all our budgets out the window and we're so lame we haven't recovered yet, even though it's been a year. But he's coming off formula now, so maybe that'll help a bit.)
And remember, searchers, AND is your friend! :)
'Tis true, you can actually type "and". But I like the little radio button for a couple reasons:
1) Typing (e.g.) "heavy and gruts" when you want to look for "heavy gruts" just seems bizarre to me.
2) If you have a lot of words to look for, all the "and"s can get annoying. For instance, say I wanted to find all Aldin's welcome messages. I could type "a and laurel and and and a and hearty and handshake", but it would be simpler to just type "a laurel and a hearty handshake" and hit the radio button and be done with it.
(Note: since "a" and "and" are actually stopwords and you can't do phrase searching anyway, you could actually search for "laurel hearty handshake" and type even less.)
I've been doing so much searching (for testing purposes) that I've come up with a few tips, and a few complaints (please note: my complaints are just sort of pointing out the limitations of our search and are in no way demands that something be done about it):
:idea: Sometimes it can be helpful to tell it to show posts instead of topics. Nothing is more frustrating than to have search tell you that what you're looking for is somewhere in a 25-page thread.
:x What it should really have is a way to combine the two, so that it only shows each topic once, but then lists which individual posts within it have the search term. And maybe a threshhold so that if more than maybe 4 or 5 posts in a thread have it, it just gives a note saying "too many posts to list". Or maybe it only lists individual posts for threads over 1 page long. Or something.
:x And when you do search by post, I'd rather see a portion of the text that includes the search term rather than just the beginning of the post.
:idea: Encourage newbies to spell things right! It's all fine and well to say that we don't really care about spelling, and if you write "Nikita" agent we'll still know what you're talking about ... and we will indeed. But search will not. (Likewise, if someone is looking for "Nakita agents", you're unfortunately going to have to tell them to search for "nikita agents" as well.)
:x Let me get this straight: I can search the text and title together, or I can search the text only, but I can't search only the titles? What moron over at phpBB came up with that brilliant idea? :blowup:
Anyhoo ... maybe I should write up a post explaining how search works and then someone could sticky it and you guys could refer to it during newbie training. Or something.
truth
March 20th, 2007, 10:44 AM
Hey, thanks for all the props guys, especially from truth.
guess now i'll have to put my money where my mouth is. :)
Heh. Actually, that's pretty cool ... since I'm a lameass and haven't gotten my money together to make a donation to the site, at least this way I can feel like I'm contributing, albeit indirectly.
Haven't donated anything? You of all people know how much it would have cost me to hire someone to fix the search tables. Thusly you have earned the site supporter tag. Your donation of skilled labor is very generous.
Dennys
March 20th, 2007, 11:30 AM
I know this is heresy, but could #2 be fixed with server downtime? And congrats on your spectaular accomplishment.
I think the Gencon 2008 figure should be named Xotli.....
Ketch
March 20th, 2007, 03:37 PM
Hey, thanks for all the props guys, especially from truth.
guess now i'll have to put my money where my mouth is. :)
Heh. Actually, that's pretty cool ... since I'm a lameass and haven't gotten my money together to make a donation to the site, at least this way I can feel like I'm contributing, albeit indirectly.
Haven't donated anything? You of all people know how much it would have cost me to hire someone to fix the search tables. Thusly you have earned the site supporter tag. Your donation of skilled labor is very generous.
Indeed, thanks Xolti,
I have an idea for #2, but it doesn't seem very important.
We can talk about it later.
Xotli
March 26th, 2007, 10:17 AM
Thanx again for all the kind words, guys. Sorry I had to disappear for a week again; trying to get work ready for me to be on vacation for a week (not this week, but next).
Haven't donated anything? You of all people know how much it would have cost me to hire someone to fix the search tables. Thusly you have earned the site supporter tag. Your donation of skilled labor is very generous.
Well, I do have an idea how much one of us thieving consultants would likely charge you for such a thing, but I also suspect that had I not been around, Ketch would have stepped up to fill the void, or we might have just learned to live with not being able to search. ;) But nonetheless I really appreciate you looking at it this way, and I'm quite proud of my new badge there ... a title and a badge all with one thread! Man, I'm moving up in the world. :D
I know this is heresy, but could #2 be fixed with server downtime?
Well, yes and no. I mean, the ID's at this point are already overlapping, and taking the server down wouldn't fix that. But I suppose downtime might enable us to find the max word_id, adjust all new word_id's from #2 above that, insert them into the tables, then make sure MySQL starts with the new max word_id. (Although I have to admit I'm not entirely sure how to do that last bit there.) But honestly I'm not sure it's worth it. Sometimes a "word" is just a collection of letters that someone has typed ... I can tell you for sure that I've looked at some of them and to call them words would be damned generous. I doubt anyone would be searching for them any time soon. And some are foreign words, which admittedly could make things inconvenient for our Spanish, French, and German speakers around here (and I know there are a few here and there), but the majority of us won't be searching for those types of things. What I really need to do is review the word list and see how much in there is truly worth saving.
Another plan that I've thought of is just to wait a few months and then see if any of those words get added to the site naturally. If they do, I can adjust the word_id's and add the post index info back in. If they don't ... well maybe if those words are never being typed again, they weren't that important in the first place.
I have an idea for #2, but it doesn't seem very important.
We can talk about it later.
Sure, just drop me a PM.
maybe I should write up a post explaining how search works and then someone could sticky it and you guys could refer to it during newbie training. Or something.
Actually I see that Nether already did that (http://www.heroscapers.com/community/showthread.php?t=3093). Although I think he's wrong in at least one place ... the site doesn't actually do phrase searching, as I mentioned above.
I think the Gencon 2008 figure should be named Xotli.....
Heh. Yes, and he should be a lich or something. Undead skeletal wizard. Also there should be a squad of skeleton warriors for him to bond with. And ... and ...
Actually, the name "Xotli" comes from the Conan mythos.
Xotli, Lord of Terror is a great demon from the Elder Night worshipped by the Antillians. Xotli hovers, like a great black cloud with a single eye, above the Great Pyramid in Ptahuacan as hundreds of people are sacrificed each month. Their hearts are cut out, their body goes to feed the dragons inside the pyramid, and their soul goes to feed Xotli.
:twisted:
HyperactiveSloth
March 27th, 2007, 12:03 PM
I'm curious. What would happen if you just made a post somewhere containing all the missing words, let the index absorb them naturally, then do your idea of adjusting their IDs to back index them?
Just curious. :)
netherspirit
March 27th, 2007, 12:24 PM
maybe I should write up a post explaining how search works and then someone could sticky it and you guys could refer to it during newbie training. Or something.
Actually I see that Nether already did that (http://www.heroscapers.com/community/showthread.php?t=3093). Although I think he's wrong in at least one place ... the site doesn't actually do phrase searching, as I mentioned above.
I'm open to any suggestions of changing, adding or deleting things in that thread. Just let me know, what you think needs to be changed.
Xotli
March 28th, 2007, 10:48 AM
I'm curious. What would happen if you just made a post somewhere containing all the missing words, let the index absorb them naturally, then do your idea of adjusting their IDs to back index them?
Just curious. :)
You know, I actually thought about that ... it would definitely be one suggestion. Of course, it would mean that that (irrelevant) post would constantly come up in any searches for any of those words. But perhaps after I've added the words back in, I could delete the post ... hmm ...
Also, I don't know if there's any sort of limit on post size that I might hit with a post composed entirely of over 84,000 words. Also also, a lot of the "words" are, as I mentioned before, foreign words, which tend to have foreign characters. That gave me quite a fit originally, until I realized that while I was communicating with MySQL in terms of Latin1, phpMyAdmin was (for some bizarre reason) assuming UTF8. Which mangled all the diacritics and other special chars. So I'd have to work out all the encoding issues, this time with my browser thrown in there.
But really, all those issues are probably work-around-able. And, in the end, that may very well be the best way to go. But, as I say, it'll probably be a few more weeks before I have a chance to sit down and look at it.
I'm open to any suggestions of changing, adding or deleting things in that thread. Just let me know, what you think needs to be changed.
Thanx nether. The two big things I noticed were:
a) The site doesn't (ANAICT can't) do phrase searching. So your statement:
If you are searching for a phrase such "Movie Trailers" it will find only things that have Movie Trailers in them, not everything with the word Movie and then everything with the word Trailer.
is right in what it won't do, but it seems to imply that it will find only the phrase "movie trailers", which isn't right. It will find anything that has "movie" and also "trailers" in it, regardless of how far apart the two words are (they do have to be in the same post though).
b) It might be nice to have a note reminding people that spelling impacts search, so do try to get those tricky ones right: Nakita, Thaelenk (which is the one that I absolutely can't get right--I had to look it up just now again), Marro (which for some reason a lot of people want to spell "Marrow"), etc.
I'll look at it in more detail a bit later and see if I have any further suggestions.
vBulletin® v3.6.9, Copyright ©2000-2013, Jelsoft Enterprises Ltd.