|
Scapers Online Discussing the online HS community |
|
Thread Tools | Search this Thread | Display Modes |
#13
|
|||
|
|||
To recreate the search indices would mean
1. first finding all 2 million + words ever typed in these forums. 2. searching every post ever created (currently 180 000+) and putting the post number into the index under that word. To do this by programming something to parse the posts and put them into the table again would take 100% of the websites resources for a couple of days with a O(n) algorithm. You don't realize how much time it takes to make that table, but over the course of a year the site slowly built the search index. To do it manually would take in the order of years. ::Nathan Hoel:: |
#14
|
||||
|
||||
Quote:
a) Take a snapshot of the DB as is. b) Regenerate the search records for all posts prior to the earliest post currently search-indexed. This would be done on a separate machine, to avoid hurting the site's performance. c) Reintegrate the new records with the old. So the site is only down during a) for sure, and maybe for c), depending on how phpBB's search tables work. And, technically, since old posts never change and since, for what needs to be done in b) there, we don't really give a crap about whether the DB is in a fully consistent state, you don't even actually have to take the site down for a). Technically. Although dumping all the tables would certainly slow the site down quite a bit. Of course, if the site has regular backups (which hopefully it does), we could just use one of those for a). Anything remotely recent should be good; it would have to be pretty darned old to be too old to be useful at this point. c) would probably be the only really bitchy part. But, again, it depends on how phpBB stores its search index tables. If the records are per word/post, then reintegrating them is no problem at all, since you're just inserting a whole new batch of records into the table. If the records are only per word, then existing records would need to be modified to include the older post info, and that would definitely get a bit hairy. Or if there's some other wacky possible way of storing the records that I haven't thought of, then we'd have to take it from there. Again, I don't want anyone to think I'm just mindlessly bitching over here: I have the resources to help with this project if necessary. I don't really have the time, but I'd make the time because I think this is pretty darned important. I don't have _all_ the knowledge necessary, but I do have over 10 years' experience in general data munging, most of it on large datasets, and I think I could work out the phpBB-specific bits of it without too much effort. Just throwing out ideas. |
#16
|
|||
|
|||
Well, reading recent posts, it does seem to have made an impact, people are already asking questions that were obviously available before ie. suggested tournament rules, etc.
Good idea about the snapshot. You'd only need the one table too (post text). You forgot part "before-a") Ask Truth if they even want it back. They wouldn't truck it without any consideration, they may want to change their minds, I haven't missed it yet, you might want to ask them. I can write the algorithm/code for you, and you can do the rest if they want it done. ::Nathan Hoel:: |
#17
|
||||
|
||||
Quote:
Quote:
Quote:
Quote:
PM coming your way. |
#18
|
||||
|
||||
YEAH!!
You guys will be my heros...again...If you can pull this off |
#19
|
||||
|
||||
Quote:
|
#20
|
||||
|
||||
Sorry for the delay guys; I've had a sick kid here for the past couple of days. But just wanted to let everyone know that I haven't given up; it's just taking a bit longer than I'd hoped.
|
#21
|
||||
|
||||
Quote:
So the deal is that, after several weeks of passing some nasty mutating virus amongst the four of us, then a couple more weeks of trying to get everything done that didn't get done while anywhere from one to three of us were sick at any given time, I'm finally back to working on this issue. So, don't give up hope: I'm still on the case, I'm just really really slow. As of now, here's a quick status update. I have downloaded all the posts from the fora(*) and put them into a blank copy of phpBB. I then started running a great little extension I found which, handily enough, rebuilds the search index tables. So far, it's been running 15 hours. It's about 67% done. Hopefully this lets everyone know why we couldn't just do this on the site directly. My next step(s) will be to figure out how to merge the search bits that aren't on the site with the ones that currently are. I don't think it will be that hard, but we'll see. It will probably be at least another week before this gets done, and possibly two (the kids' mother is out of town all weekend coming up so that my put a crimp in my plans). But I'll try to keep everyone here in the loop. (*) Technically, all the public posts. In case anyone was worried. |
#22
|
||||
|
||||
Xotli, no matter how long this takes, know that you have the deep appreciation of many people here, especially those of us who actually enjoy using the search function. Thank you!
|
#23
|
||||
|
||||
Whoa!
Xotli - you are indeed loved and appreciated ~Aldin, compiling He either fears his fate too much or his desserts are small That dares not put it to the touch to gain or lose it all ~James Graham |
#24
|
||||
|
||||
Xotli=Search and Rescue Expert. Thanks for your efforts, Xotli! Glad to hear you and your family are doing better.
Newb. |