Results 1 to 3 of 3

Thread: Searching dates within text.

  1. #1

    Thread Starter
    Hyperactive Member Krass's Avatar
    Join Date
    Aug 2000
    Location
    Montreal
    Posts
    489

    Searching dates within text.

    My app crawls a couple of webpages. Grabs the innertext (not html) of each of those pages and put them in textboxes.

    My goal is to automate DATE extraction off those texts. I think I will have to code some wicked routines to intercept all dates that could be found, regardless of the format:

    May 30, 2012
    May 30 2012
    May-30-2012
    May 30, 2012
    2012-04-30
    12-04-30
    12/04/30
    (and *SO* many others)

    I think that by frequently updating my code, I might end up retrieving most the dates found in there.

    I already have some good (I think) ideas on how to approach this piece of code but then I thought: As someone ever done that? If a routine is already available, that'd save some work. Couldn't find on the forums.

    Any thoughts greatly appreciated!
    Chris

  2. #2
    PowerPoster
    Join Date
    Aug 2011
    Location
    B.C., Canada
    Posts
    2,887

    Re: Searching dates within text.

    use XML extraction

  3. #3
    PowerPoster
    Join Date
    Jul 2006
    Location
    Maldon, Essex. UK
    Posts
    6,334

    Re: Searching dates within text.

    I have a routine that could possibly be tweaked. It takes some input data and a 'mask' representing a date format and then attempts to convert the input to a Date variable.

    Code:
    Public Function ConvertDate(strInput As String, strMask As String) As Date
    Dim strTemp() As String
    Dim strTmask() As String
    Dim strDD As String
    Dim strMM As String
    Dim strYY As String
    Dim strDate As String
    Dim strSep As String
    Dim intI As Integer
    Dim intPos As Integer
    '
    'strMask:
    '
    'strmask defines the date format expressed by
    ' (Case neutral)
    '
    ' dd = day no leading zero
    ' mm = month no leading zero
    ' yy = 2 digit year
    ' yyyy = 4 digit year
    ' s = Separator
    ' mmm = month characters eg JAN
    ' lm = full month name
    ' ds = "st" or "th"
    '
    strMask = UCase(strMask)
    strInput = UCase(strInput)
    If Mid$(strMask, 3, 2) = "DS" Then
        strSep = Mid$(strMask, 5, 1)
    Else
        strSep = Mid(strMask, 3, 1)
    End If
    strTemp = Split(strInput, strSep)
    strTmask = Split(strMask, strSep)
    For intI = LBound(strTmask) To UBound(strTmask)
        Select Case strTmask(intI)
            Case "DD", "DDDS"
                strDD = strTemp(intI)
            Case "MM", "MMM", "LM"
                strMM = strTemp(intI)
            Case "YY", "YYYY"
                strYY = strTemp(intI)
        End Select
    Next intI
    intPos = InStr(strDD, "ST")
    If intPos > 0 Then
        strDD = Mid(strDD, 1, intPos - 1)
    End If
    intPos = InStr(strDD, "TH")
    If intPos > 0 Then
        strDD = Mid$(strDD, 1, intPos - 1)
    End If
    intPos = InStr(strDD, "RD")
    If intPos > 0 Then
        strDD = Mid$(strDD, 1, intPos - 1)
    End If
    strDate = strDD & " " & strMM & " " & strYY
    ConvertDate = CDate(strDate)
    End Function
    The Mask is a combination of:
    dd = Day Number (with or without leading zero)
    ddds = Day number followed by a Day Qualifier (ie. 'st' or 'rd' or 'th')
    mm = Month Number (ie 1 to 12 with or without leading zero)"
    mmm = Short Month (eg Jan,Feb)
    lm = Long Month (eg January, February)
    yy = Short Year (last 2 digits of Year)
    yyyy = Long Year (eg 2010)
    Each item must be separated by a space or other character

    eg: Entering a mask such as: 'ddds lm yyyy' would indicate a date such as 1st January 2010
    or: 'dd/mm/yy' would indicate a date such as 01/01/10 or 1/1/10

    I'm thinking that, once you've found some input that might represent a date, you could call the Function multiple times with a different mask (one for each possible date format) until it returns a valid date. You'd need to add some error handling in case the input data turns out not to be a date. (Possibly by adding an IsDate test prior to assigning the return value).

    Whether it copes with all possible date formats I don't know, but it's been working for me for a couple of years without problems. (But that may just be good fortune!)
    Last edited by Doogle; May 19th, 2012 at 12:18 AM. Reason: Typos

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width