|
-
May 18th, 2012, 07:08 PM
#1
Thread Starter
Hyperactive Member
Searching dates within text.
My app crawls a couple of webpages. Grabs the innertext (not html) of each of those pages and put them in textboxes.
My goal is to automate DATE extraction off those texts. I think I will have to code some wicked routines to intercept all dates that could be found, regardless of the format:
May 30, 2012
May 30 2012
May-30-2012
May 30, 2012
2012-04-30
12-04-30
12/04/30
(and *SO* many others)
I think that by frequently updating my code, I might end up retrieving most the dates found in there.
I already have some good (I think) ideas on how to approach this piece of code but then I thought: As someone ever done that? If a routine is already available, that'd save some work. Couldn't find on the forums.
Any thoughts greatly appreciated!
-
May 18th, 2012, 11:53 PM
#2
Re: Searching dates within text.
-
May 19th, 2012, 12:14 AM
#3
Re: Searching dates within text.
I have a routine that could possibly be tweaked. It takes some input data and a 'mask' representing a date format and then attempts to convert the input to a Date variable.
Code:
Public Function ConvertDate(strInput As String, strMask As String) As Date
Dim strTemp() As String
Dim strTmask() As String
Dim strDD As String
Dim strMM As String
Dim strYY As String
Dim strDate As String
Dim strSep As String
Dim intI As Integer
Dim intPos As Integer
'
'strMask:
'
'strmask defines the date format expressed by
' (Case neutral)
'
' dd = day no leading zero
' mm = month no leading zero
' yy = 2 digit year
' yyyy = 4 digit year
' s = Separator
' mmm = month characters eg JAN
' lm = full month name
' ds = "st" or "th"
'
strMask = UCase(strMask)
strInput = UCase(strInput)
If Mid$(strMask, 3, 2) = "DS" Then
strSep = Mid$(strMask, 5, 1)
Else
strSep = Mid(strMask, 3, 1)
End If
strTemp = Split(strInput, strSep)
strTmask = Split(strMask, strSep)
For intI = LBound(strTmask) To UBound(strTmask)
Select Case strTmask(intI)
Case "DD", "DDDS"
strDD = strTemp(intI)
Case "MM", "MMM", "LM"
strMM = strTemp(intI)
Case "YY", "YYYY"
strYY = strTemp(intI)
End Select
Next intI
intPos = InStr(strDD, "ST")
If intPos > 0 Then
strDD = Mid(strDD, 1, intPos - 1)
End If
intPos = InStr(strDD, "TH")
If intPos > 0 Then
strDD = Mid$(strDD, 1, intPos - 1)
End If
intPos = InStr(strDD, "RD")
If intPos > 0 Then
strDD = Mid$(strDD, 1, intPos - 1)
End If
strDate = strDD & " " & strMM & " " & strYY
ConvertDate = CDate(strDate)
End Function
The Mask is a combination of:
dd = Day Number (with or without leading zero)
ddds = Day number followed by a Day Qualifier (ie. 'st' or 'rd' or 'th')
mm = Month Number (ie 1 to 12 with or without leading zero)"
mmm = Short Month (eg Jan,Feb)
lm = Long Month (eg January, February)
yy = Short Year (last 2 digits of Year)
yyyy = Long Year (eg 2010)
Each item must be separated by a space or other character
eg: Entering a mask such as: 'ddds lm yyyy' would indicate a date such as 1st January 2010
or: 'dd/mm/yy' would indicate a date such as 01/01/10 or 1/1/10
I'm thinking that, once you've found some input that might represent a date, you could call the Function multiple times with a different mask (one for each possible date format) until it returns a valid date. You'd need to add some error handling in case the input data turns out not to be a date. (Possibly by adding an IsDate test prior to assigning the return value).
Whether it copes with all possible date formats I don't know, but it's been working for me for a couple of years without problems. (But that may just be good fortune!)
Last edited by Doogle; May 19th, 2012 at 12:18 AM.
Reason: Typos
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|