So today I sent myself from home a script i wrote to check my rankings on google.com for my home site ListThatAuto.com. Basically it took my keywords list, parsed it into google urls, uses sockets to connect, grabs the content, parses through the content to pull the result links, checks the result links if it has the name of my site in it, and displays that link and position number on the page if it is there. Pretty simple and straight forward.
Here was the problem. When running the script at home, it would time out because each socket call to google took about 1.1 seconds from start to finish for grabbing the content and processing it. This unfortunately limited me to about 28 keywords at a time to check. I knew that I could do better so I did a few home brewed load balace tests, and was able to identify parts of my code that could use subtle but significant improvement. I made these changes one by one and managed to reduce the time to about .8 seconds per keyword. Still I was capped at about 37-39. My personal site has only 41, so i was just a few away. After about another hour of load testing, I came to the conclusion that my code was as efficient as it would get.
Though it was rather inefficient, I still sent it to work. This is where the most amazing breakthrough took place. If you program in php, you have probably used sockets. Well during my tests at home, I found that most of my time per keyword was just communicating with google. I posed this to our dev team (which i am part of) and we brainstormed for a few minutes on this and developed a theory.
We believed that with PHP all a socket does is open a stream, read it from the source, and store it in a buffer for the script to access at anytime. This means that we could essentially send the request for the information immediately after opening the socket, and just move on to the next connection and do the same until all the connections are made, and then slowly come back and clean up by storing and processing the content as we grab it at our own pace.
I re-wrote my class to a single recursive function to grab, process, and display the above described information. This new script, using the idea of just opening the connection and requesting the data and moving to the next one before retrieving it does the entire process in about 3.5 seconds.
Pretty Impressive if you ask me.
loushou.
Thursday, June 26, 2008
Tuesday, June 24, 2008
Search Engine Development
Today I was assigned a project yesterday to basically start plans for a new search engine with a completely unique algorithm for determining page relevance and page rank. This will be attached to at least NewestMLM.com within a few months. We are basically spending several days to come up with an effective game plan to implement such a project, which should in the long run make this whole project easier.
At home, I have a site, ListThatAuto.com, that is still under development, which i have built a few tools for that i believe could be useful in the development of this new search engine. They involve advance url stripping, content storage, and a couple of analysis tools. I should be sending them to my work email in the next couple days once I compile the list.
This is a big task and i am glad to be a big part of it. I feel like this will be one of those projects that allows you to be a part of something much much bigger than yourself and that is what i am looking for. SEO is hitting a frontier that we can only dream of at this point, however if we want to jump in at any point, the soon the better right....?
loushou
At home, I have a site, ListThatAuto.com, that is still under development, which i have built a few tools for that i believe could be useful in the development of this new search engine. They involve advance url stripping, content storage, and a couple of analysis tools. I should be sending them to my work email in the next couple days once I compile the list.
This is a big task and i am glad to be a big part of it. I feel like this will be one of those projects that allows you to be a part of something much much bigger than yourself and that is what i am looking for. SEO is hitting a frontier that we can only dream of at this point, however if we want to jump in at any point, the soon the better right....?
loushou
Thursday, June 19, 2008
moving on to BIGGER not so much better things
After a careful analysis of the new search.jsp file from yesterday's project, i found that every bit of information I was hoping to learn from recreating this script, is all handled in the background by the java applets running on the tomcat connection. Thus all of the work I have done so far is to the avail of nothing more than chasing rabbits...
Agravated, I began to look for the source for this java applet and all it's components, only to find that I need to get subversion from apache in order to get the source, as it is only held in a repository and no packages exist for it. So I downloaded the subversion app and installed it. After a few failed attempts of trying to checkout the current source, I decided to HUNT for the proper syntax to use subversion to get the source. I found the one line command that EVERYONE has executed to get this source before me, only to find that on my computer it does not work.
After a little more searching, I found one other source that i could download the source from. The catch is that i would have to right click each file and do a save as routine on each of about 200 files. Thinking to myself that this is bogus, I decided to start to write a php script to do it for me. that is where i am leaving today and will continue tomorrow.
Loushou.
Agravated, I began to look for the source for this java applet and all it's components, only to find that I need to get subversion from apache in order to get the source, as it is only held in a repository and no packages exist for it. So I downloaded the subversion app and installed it. After a few failed attempts of trying to checkout the current source, I decided to HUNT for the proper syntax to use subversion to get the source. I found the one line command that EVERYONE has executed to get this source before me, only to find that on my computer it does not work.
After a little more searching, I found one other source that i could download the source from. The catch is that i would have to right click each file and do a save as routine on each of about 200 files. Thinking to myself that this is bogus, I decided to start to write a php script to do it for me. that is where i am leaving today and will continue tomorrow.
Loushou.
Wednesday, June 18, 2008
more JSP
So something weird happened when I came in today to continue my woeful travels in the JSP search engine stuff. All was well until I tried to view the search file again. My quanta locked up and force me to restart it. When I did this, the search.jsp file had been truncated to size 0. (for those of you that are not in the know that means essentially all the data in that file went bye-bye).
Thus for the better portion of the day, working on the assumption that a good portion of the code we are using is open source, I have been trying to locate the proper version so that it could be replaced. This wound up not working because apparently we go this code from someone who slightly modified the code so that it would have additional functionality, consequently changing some of the key structure of the file that is now gone. To recreate this from memory would be a feat in it of itself, so we are forced to get it again from the company who supplied it.
Who knows how long that will take. In any event, once we get it back, I should have developed another solution to this problem that is killing all our listings.
loushou.
Thus for the better portion of the day, working on the assumption that a good portion of the code we are using is open source, I have been trying to locate the proper version so that it could be replaced. This wound up not working because apparently we go this code from someone who slightly modified the code so that it would have additional functionality, consequently changing some of the key structure of the file that is now gone. To recreate this from memory would be a feat in it of itself, so we are forced to get it again from the company who supplied it.
Who knows how long that will take. In any event, once we get it back, I should have developed another solution to this problem that is killing all our listings.
loushou.
Tuesday, June 17, 2008
JSP
Well we have some of our software that is run off of JSP. unfortunately it is producing some unexpected results, which are very inconsistent. See we have a spider that does just that with pages on the net. then after they are spidered, they are thrown into a search tool. now we have submitted many sites to the spider and a portion of them are not able to be found when we search. we know that they were spidered. we know that the others are showing up. still they remain gone. Now this JSP script uses some type of DB that is not SQL driven, as far as i can tell. which poses a problem for me.
The problem is that all of the back end interacting with the db is in compiled java. the JSPs are just executing these commands and grabbing the results. so... we are stuck for the moment.
If anyone can figure it out it is me.
loushou.
The problem is that all of the back end interacting with the db is in compiled java. the JSPs are just executing these commands and grabbing the results. so... we are stuck for the moment.
If anyone can figure it out it is me.
loushou.
Friday, June 13, 2008
Ebay....XML.... not a good mix
Well... I really don't know what else to say. XML should be XML, especially if it is structured properly, which this code is. The problem is we through the XML with the XSL template in the XSL processor and magically we do not have certain nodes that can be accessed with the template. If we dump the RAW XML... the nodes are there. In fact, if we do the template function that allows you to get the name of a node and throw it in a template that matches all the child nodes of a node, even after the XSL processor parses it, we can get the names of the nodes that "don't exist". however, as soon as we step into that main node where the information that we need is, it craps on itself.
frustrating, confusing, and inconsistant.
Loushou.*frowning*
frustrating, confusing, and inconsistant.
Loushou.*frowning*
Thursday, June 12, 2008
Learn XML, XSL, and XSLT.... NOW!!!!
This is pretty much the only thing going through my mind right now. I need to actually take the time to learn these because this is my primary weakness. I could probably avoid a large amount of questions if i just master these few markups. I look at it and, because i have not used it before, i just star blankly at it and change small things hoping for results.
This must change. NOW!!!!
Loushou.
This must change. NOW!!!!
Loushou.
Subscribe to:
Posts (Atom)