I was spending my free time this weekend checking the powerball website when i decided i wanted to use cfhttp and get the number frequency information on this page:
http://www.powerball.com/powerball/pb_frequency.asp?
matrixID=2&graph=0
The problem is I am not sure what to do next.
What I am trying to do is get into this page and grab the numbers from the table...then loop through a DB insert and populate a table. How do I do this using cfhttp?
Before we get into this, let's just talk about screen scraping for a second. Screen scraping is always a dangerous thing to do because it relies on a third-party service that may change at the drop of a hat. When scowering a page for data, we can do things like search for keywords and match patterns, which works; but, at any time, the target page can have its structure change which may or may not render our algorithm completely useless. So, use screen scraping with discretion.
That being said, let's dive right into the problem. While I don't know what database you are going to use to store the PowerBall statistics, I am going to move the PowerBall data into a manually created ColdFusion query. You can then easily take this ColdFusion query, loop over it, and insert the values into a database.
In order to get the PowerBall statistical data from the page, we are going to use a little regular expression pattern matching. From looking at the page source of the given URL, it is clear that the PowerBall statistics exist in an HTML table with set headers. We can use this knowledge to pull out every complete table and search its contents for the given headers. This will give us a sort of brute force way to find the proper data table.
Once we have this HTML data table in a variable, we are going to treat it like an XML document against which we can run some XmlSearch() calls. The hurdle with this is that most HTML is not really XML-standards compliant. So, to get around this, we are going to take our table HTML and try to strip out all elements and attributes that might violate XML standards. This will include all the Font tags and Br tags and even the table tag attributes. Once this is done, we should be left with a data string that consists only of table, tr, and td tags which can easily be parsed.
Let's take a look at that code:
Launch code in new window » Download code as text file »
Running the code, we get the following PowerBall statistics query:
| | | | ||
| | ![]() | | ||
| | | |
Works quite nicely. Just remember, the cell values in our ColdFusion query that look like empty strings are actually NULL values, but ColdFusion will treat them as empty strings. When you go to insert these values into your own database, you can either use the null="" attribute of the CFQueryParam tag to insert the nulls, or just use the column value (qPowerBall.COLUMN) to insert an empty string.
Hope that helps a bit.
Download Code Snippet ZIP File
Comments (9) | Post Comment | Ask Ben | Permalink | Other Searches | Print Page
Thanks Ben, this is awesome...I'm going to work with it want I get home.
Frank
Posted by Frank Tudor on Jan 14, 2008 at 9:35 AM
@Frank,
My pleasure.
Posted by Ben Nadel on Jan 14, 2008 at 10:06 AM
Ben, I have a mildly retarded question. How do I stuff the output into an access table? I don't even know what angle to come from at this (laugh).
Posted by Frank Tudor on Jan 14, 2008 at 7:33 PM
@Frank,
No problem. All you have to do is loop over the resultant PowerBall query and insert each record. Something like this:
<cfloop query="qPowerBall">
<cfquery name="qInsert" datasource="...">
INSERT INTO pb
(
ball_number,
frequency
) VALUES (
<cfquerypara value="#qPowerBall.ball_number#" type="cf_sql_varchar" />,
<cfquerypara value="#qPowerBall.powerball_fr#" type="cf_sql_varchar" />
);
</cfquery>
</cfloop>
Of course, you just need to adjust for the columns that you actually have. Should be good.
Posted by Ben Nadel on Jan 15, 2008 at 7:30 AM
Thanks again Ben. I'll give that a go when I get home.
Now, how about a probability algorithm based on the ball drop frequency?
HA! just kidding...
:D
Thanks Again...
Posted by Frank Tudor on Jan 15, 2008 at 9:32 AM
Just copy and paste into excel. Run your stats then.
Posted by dvd on Mar 7, 2008 at 2:10 PM
When attempting to test the code provided on my host, I get an error stating "...denied access to createobject(java)..."
What is the (or a ) workaround for this?
Thank you,
John
Posted by John H on Apr 7, 2008 at 1:13 PM
@John,
Sounds like the CreateObject() method is not allowed in your hosting environment. You might want to try CFInvoke, but to be honest, I am not sure if that can create Java objects (I hardly ever use it).
Posted by Ben Nadel on Apr 7, 2008 at 1:15 PM
it's so many things that I can learn from here... great
keep updating....
Posted by monk on Apr 16, 2008 at 11:07 PM