Page 1 of 2 12 LastLast
Results 1 to 40 of 43

Thread: Import data from multiple html files into one excel worksheet

  1. #1

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Import data from multiple html files into one excel worksheet

    I got the below macro from a old thread. I have the lines on the html page files saved on c drive, which are supposed to be in a table format but they are not and there is no separator between each column data except some undefined spaces as shown in below image, so how can i import it as a proper table .i.e. into separate columns in the excel file, also i have multiple such html files so can i import multiple files into one worksheet at one go.


    Code:
    Sub HTML_Table_To_Excel()
    
        Dim htm As Object
        Dim Tr As Object
        Dim Td As Object
        Dim Tab1 As Object
    
        'Replace the URL of the webpage that you want to download
        Web_URL = VBA.Trim(Sheets(1).Cells(1, 1))
    
        'Create HTMLFile Object
        Set HTML_Content = CreateObject("htmlfile")
    
        'Get the WebPage Content to HTMLFile Object
        With CreateObject("msxml2.xmlhttp")
            .Open "GET", Web_URL, False
            .send
            HTML_Content.body.innerHTML = .responseText
        End With
    
        Column_Num_To_Start = 1
        iRow = 2
        iCol = Column_Num_To_Start
        iTable = 10
    
        'Loop Through Each Table and Download it to Excel in Proper Format
        For Each Tab1 In HTML_Content.getElementsByTagName("table")
            With HTML_Content.getElementsByTagName("table")(iTable)
                For Each Tr In .Rows
                    For Each Td In Tr.Cells
                        Sheets(1).Cells(iRow, iCol).Select
                        Sheets(1).Cells(iRow, iCol) = Td.innerText
                        iCol = iCol + 1
                    Next Td
                    iCol = Column_Num_To_Start
                    iRow = iRow + 1
                Next Tr
            End With
            iTable = iTable + 1
            iCol = Column_Num_To_Start
            iRow = iRow + 1
        Next Tab1
    End Sub
    attached is my html file image.
    Attached Images Attached Images  

  2. #2
    PowerPoster jdc2000's Avatar
    Join Date
    Oct 2001
    Location
    Idaho Falls, Idaho USA
    Posts
    2,393

    Re: Import data from multiple html files into one excel worksheet

    Is that really an HTML file with HTML tags, or is it actually just a text file with a fixed width fond and spaces to get the data into the correct columns? If it is a text file, you would need something like this:

    Code:
    ' Adjust the column locations to fit your data
    Workbooks.OpenText _
                Filename:=strImportFile, Origin:=437, StartRow:=1, DataType:=xlDelimited, _
                TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, _
                Semicolon:=False, Comma:=True, Space:=False, Other:=False, _
                FieldInfo:=Array(Array(1, 2), Array(2, 1), Array(3, 2), Array(4, 2), Array(5, 2), _
                Array(6, 2), Array(7, 2), Array(8, 2), Array(9, 2), Array(10, 3), Array(11, 1), _
                Array(12, 2), Array(13, 2), Array(14, 2), Array(15, 2), Array(16, 1), Array(17, 2), _
                Array(18, 1), Array(19, 2), Array(20, 2), Array(21, 2), Array(22, 2), Array(23, 2), _
                Array(24, 9), Array(25, 9), Array(26, 9), Array(27, 9), Array(28, 9), Array(29, 9), _
                Array(30, 9), Array(31, 9), Array(32, 9), Array(33, 9), Array(34, 9), Array(35, 9), _
                Array(36, 9), Array(37, 9), Array(38, 9), Array(39, 9), Array(40, 9), Array(41, 9), _
                Array(42, 9), Array(43, 9), Array(44, 9), Array(45, 9), Array(46, 9), Array(47, 9), _
                Array(48, 9), Array(49, 9), Array(50, 9), Array(51, 9), Array(52, 9), Array(53, 9), _
                Array(54, 9), Array(55, 9), Array(56, 9), Array(57, 9), Array(58, 9), Array(59, 9), _
                Array(60, 9), Array(61, 9), Array(62, 9), Array(63, 9), Array(64, 9), Array(65, 9), _
                Array(66, 9), Array(67, 9), Array(68, 9), Array(69, 9), Array(70, 9), Array(71, 9), _
                Array(72, 9), Array(73, 9), Array(74, 9), Array(75, 9), Array(76, 9), Array(77, 9), _
                Array(78, 9), Array(79, 9), Array(80, 9), Array(81, 9), Array(82, 9), Array(83, 9), _
                Array(84, 9), Array(85, 9), Array(86, 9), Array(87, 9), Array(88, 9)), _
                TrailingMinusNumbers:=True
            ' Set column widths
            strSheetName = ActiveSheet.Name
            ActiveWorkbook.Worksheets(strSheetName).Columns("A:A").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("G:H").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("J:K").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("M:O").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("S:W").EntireColumn.AutoFit

  3. #3

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Quote Originally Posted by jdc2000 View Post
    Is that really an HTML file with HTML tags, or is it actually just a text file with a fixed width fond and spaces to get the data into the correct columns? If it is a text file, you would need something like this:

    Code:
    ' Adjust the column locations to fit your data
    Workbooks.OpenText _
                Filename:=strImportFile, Origin:=437, StartRow:=1, DataType:=xlDelimited, _
                TextQualifier:=xlDoubleQuote, ConsecutiveDelimiter:=False, Tab:=True, _
                Semicolon:=False, Comma:=True, Space:=False, Other:=False, _
                FieldInfo:=Array(Array(1, 2), Array(2, 1), Array(3, 2), Array(4, 2), Array(5, 2), _
                Array(6, 2), Array(7, 2), Array(8, 2), Array(9, 2), Array(10, 3), Array(11, 1), _
                Array(12, 2), Array(13, 2), Array(14, 2), Array(15, 2), Array(16, 1), Array(17, 2), _
                Array(18, 1), Array(19, 2), Array(20, 2), Array(21, 2), Array(22, 2), Array(23, 2), _
                Array(24, 9), Array(25, 9), Array(26, 9), Array(27, 9), Array(28, 9), Array(29, 9), _
                Array(30, 9), Array(31, 9), Array(32, 9), Array(33, 9), Array(34, 9), Array(35, 9), _
                Array(36, 9), Array(37, 9), Array(38, 9), Array(39, 9), Array(40, 9), Array(41, 9), _
                Array(42, 9), Array(43, 9), Array(44, 9), Array(45, 9), Array(46, 9), Array(47, 9), _
                Array(48, 9), Array(49, 9), Array(50, 9), Array(51, 9), Array(52, 9), Array(53, 9), _
                Array(54, 9), Array(55, 9), Array(56, 9), Array(57, 9), Array(58, 9), Array(59, 9), _
                Array(60, 9), Array(61, 9), Array(62, 9), Array(63, 9), Array(64, 9), Array(65, 9), _
                Array(66, 9), Array(67, 9), Array(68, 9), Array(69, 9), Array(70, 9), Array(71, 9), _
                Array(72, 9), Array(73, 9), Array(74, 9), Array(75, 9), Array(76, 9), Array(77, 9), _
                Array(78, 9), Array(79, 9), Array(80, 9), Array(81, 9), Array(82, 9), Array(83, 9), _
                Array(84, 9), Array(85, 9), Array(86, 9), Array(87, 9), Array(88, 9)), _
                TrailingMinusNumbers:=True
            ' Set column widths
            strSheetName = ActiveSheet.Name
            ActiveWorkbook.Worksheets(strSheetName).Columns("A:A").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("G:H").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("J:K").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("M:O").EntireColumn.AutoFit
            ActiveWorkbook.Worksheets(strSheetName).Range("S:W").EntireColumn.AutoFit
    Its an html file only, i had opened it in the notepad++ and taken a snapshot of couple of lines to shown the example and the spaces between the table columns are same most of the time but not always, can we change the above piece of code work with it to identify the lines by html tag names
    Last edited by abhay_547; Jan 31st, 2018 at 11:33 PM.

  4. #4
    PowerPoster jdc2000's Avatar
    Join Date
    Oct 2001
    Location
    Idaho Falls, Idaho USA
    Posts
    2,393

    Re: Import data from multiple html files into one excel worksheet

    from what I can see of that screen capture, there do not appear to be any HTML tags present. Does it look the same in Notepad (NOT Notepad++)? What happens if you start Excel and try to open the file? (Make sure that you have File Types set to ALL.) Can you open the file?

  5. #5

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Quote Originally Posted by jdc2000 View Post
    from what I can see of that screen capture, there do not appear to be any HTML tags present. Does it look the same in Notepad (NOT Notepad++)? What happens if you start Excel and try to open the file? (Make sure that you have File Types set to ALL.) Can you open the file?
    Yes, there are close to 500 .htm files on my c drive folder. when i open it in excel the data which should come in multiple columns comes in one cell.

  6. #6
    PowerPoster jdc2000's Avatar
    Join Date
    Oct 2001
    Location
    Idaho Falls, Idaho USA
    Posts
    2,393

    Re: Import data from multiple html files into one excel worksheet

    Did you open one of the files using Notepad (NOT Notepad++)? Post a SMALL screen capture of what you get so we can see the file data. Are there any HTML tags visible?

  7. #7

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Quote Originally Posted by jdc2000 View Post
    Did you open one of the files using Notepad (NOT Notepad++)? Post a SMALL screen capture of what you get so we can see the file data. Are there any HTML tags visible?
    Yes, I did open it in the Notepad (Not NOTEPAD++) and i could see the HTML tags in it.

  8. #8
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    post a sample of part of one of the html files
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  9. #9

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Quote Originally Posted by westconn1 View Post
    post a sample of part of one of the html files
    Below is the html page script and attached is the image how it looks in chrome.

    Code:
    <!DOCTYPE HTML PUBLIC - //W3C// DTD HTML 4.0 Strict//EN>
    <HTML><HEAD></HEAD><BODY>
    <style>
    table {font-family:monospace;font-size:13px}
    td {white-space:pre}
    </style>
    <TABLE MARGIN=0 BORDER=0 CELLPADDING=0 CELLSPACING=0 WIDTH=100%>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">XXX SALES CORPORATION           SALES DATE: 01/17/2018 SCSLE-ID:    ZRR        REPORT-ID: 00007040000402242018</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">XXX CORPORATION DAILY SALES ACTIVITY REPORT            SYSTEM DATE: 01/17/2018 SYSTEM TIME:    17:40:20 PAGE 85</SPAN></TD></TR>
    <TR><TD BGCOLOR="white">     &nbsp;</TD></TR>
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">XXX SALES GROUP   XXDD            CLASSIFICATION INTERNAL/OTHER</SPAN></TD></TR>
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">ACCOUNT TYPE:   D            </SPAN></TD></TR>
    <TR><TD BGCOLOR="white">     &nbsp;</TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">                          B S                  TRXN     SALE      SALE   TRXN  INFO       OPP KLC    ACCOUNT</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">     DESCRIPTION    DATE  D C    RESI   SINGL  PRICE    PRICE     AMOUNT  R EFF  M/K SEF ORGZ TYP    NUMBER</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500"></SPAN><SPAN STYLE="font-weight:500;text-decoration:underline">
    
    
    <TR><TD BGCOLOR="lightgrey">     &nbsp;</TD></TR>
    <TR><TD BGCOLOR="lightgrey">     &nbsp;</TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">    (continued)</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     14.64000   16.11500      445.00       00889    047ACDDR5</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       1     14.64000   16.11500      678.00       00889    072R14433</SPAN></TD></TR>                                                                                                                        
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     14.64000   16.11500      445.00       00889    047ACDDR5</SPAN></TD></TR>
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     14.22000   16.11500      324.00       00889    047ACDDK5</SPAN></TD></TR>
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     14.74000   16.17500      755.00       00889    047ACDDS5</SPAN></TD></TR> 																															
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     15.14000   17.21500      445.00       00889    047ACDDR5</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       1     15.14000   17.21500      218.00       00819    072R166S4</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       1     15.14000   17.21500      218.00       00889    072R166S4</SPAN></TD></TR>  
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     12.32000   16.12400      885.00       00889    047ACDDR5</SPAN></TD></TR>
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     12.12000   16.12400      764.00       00819    047AK67C8</SPAN></TD></TR>
    <TR><TD BGCOLOR="lightgrey"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     12.77000   16.17500      755.00       00889    047AKRED6</SPAN></TD></TR> 
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       2     16.23000   14.14500      442.00       00886    047AERTH3</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       1     16.23000   14.21400      461.00       00819    072SIOOP2</SPAN></TD></TR>
    <TR><TD BGCOLOR="white"><SPAN STYLE="font-weight:500">   CREDIT    01/17   B C       1     16.23000   14.21400      461.00       00889    072UYOOL6</SPAN></TD></TR>

    Below is the image how it looks in chrome, now I have 500+ *.htm files saved on my local drive similar to this and I am looking to extract data from all files and import into a worksheet in excel, i want the header rows which looks like is in a tabular format.
    Attached Images Attached Images  
    Last edited by abhay_547; Feb 9th, 2018 at 01:20 AM.

  10. #10
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    Name:  clipic.jpg
Views: 4136
Size:  52.1 KB

    as above, this reads the data correctly but has some errors with the headers, where there is some empty cells and some additional spacings, in files with large data, it would probably be faster to fix those errors than loop all the rows

    Code:
    Dim doc As MSHTML.HTMLDocument
        Open "c:\temp\547.htm" For Input As 1
        htm = Input(LOF(1), #1)
        Close 1
        Set doc = CreateObject("htmlfile")
        doc.body.innerHTML = htm
        Set t = doc.getElementsByTagName("table")(0)
        Set myclip = CreateObject("clipbrd.clipboard")
        myclip.Clear
        myclip.settext t.innerText
        Range("A6").PasteSpecial
    for information on using myclip http://www.vbforums.com/showthread.php?t=585616
    there would be other alternatives to copy to the clipboard

    it is pretty simple to build a loop through all files in a folder and add below the above data

    do you actually want the headers for each file? easy to copy all then remove the headers if not required
    Last edited by westconn1; Feb 10th, 2018 at 12:02 AM.
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  11. #11

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    I am getting the error on the setmyclip line as Activex Component can't create object. I know that I have to register regsvr32 c:\users\abhy\desktop\clipboard.dll, but I can't do this at office since we wouldn't have admin rights in office to register any dll file also I can't ask IT person to register any dll file on office machine like this since its restricted. do we have any reference which we can select in the references which will make this work. also below is what i have got for looping through files so far. And ok if it imports the header for only first file and not for rest since we are going to append the data one below other from each .htm file into one worksheet.


    Code:
    Sub GetHtmlFiles()
    'reference set to Microsoft HTML Object Library
    Dim f As String, Path As String, NextFile As String, ThisBook As String
    
    Path = "C:\users\abhy\desktop\htmldatafiles\"
    ChDir Path
    f = Dir(Path & "*.htm")
    While f <> ""
          
        Dim doc As MSHTML.HTMLDocument
        Open f For Input As 1
        htm = Input(LOF(1), #1)
        Close 1
        Set doc = CreateObject("htmlfile")
        doc.body.innerHTML = htm
        Set t = doc.getElementsByTagName("table")(0)
        Set myclip = CreateObject("clipbrd.clipboard")
        myclip.Clear
        myclip.settext t.innerText
        Range("A6").PasteSpecial
        
        MsgBox f
        
        Workbooks(f).Close SaveChanges:=False
        f = Dir()
    Wend
    
    End Sub
    Last edited by abhay_547; Feb 10th, 2018 at 07:46 AM.

  12. #12
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    do we have any reference which we can select in the references which will make this work.
    it would still have to be registered, i will figure some alternative, using a dataobject, i just find it a simple way to use the clipboard object in vba

    do you want the continued line at the top of the data? is that common to be there?
    are the header rows all the same?


    Code:
    Dim doc As MSHTML.HTMLDocument, d As DataObject, htm As String
        Set d = New DataObject
        Set doc = CreateObject("htmlfile")
        fpath = "C:\users\abhy\desktop\htmldatafiles\"
        fpath = "c:\temp\"
        f = Dir(fpath & "*.htm")
        Do While f <> ""
            Open fpath & f For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            doc.body.innerHTML = htm
            Set t = doc.getElementsByTagName("table")(0)
            d.Clear
            d.settext t.innerText
            d.PutInClipboard
            rw = Cells(Rows.Count, 1).End(xlUp).Row + 1
            If rw = 2 Then rw = 1
            Cells(rw, 1).PasteSpecial
            If rw = 1 Then   'fix headers
    
                Else  ' delete headers
                hrows = 12    ' may need adjustment
                Cells(rw, 1).Resize(hrows).EntireRow.Delete
            End If
            f = Dir
        Loop
    will require reference to msforms2, i moved the object creation out of the loop so they can just be reused
    as the html file does not contain any columns (just formatted rows), it is probably easier to just put the headers for the top of worksheet from a template file or when a new worksheet is created, then just always delete the headers, they can not easily be fixed from the html file
    the delete headers may require some adjustment, but works with the sample data file
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  13. #13

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    I just need the header from the first file of html folder, it can be ignored from rest all files since we will append data from each file one below another. Below is the snapshot highlighting the header row.

    Also I added the reference to Microsoft Forms 2.0 Object Library, it wasn't appearing in the references in excel 2010 but i used the browse button and selected it from C:\WINDOWS\system32\FM20.DLL and then it started appearing in the references but when I run the code nothing happens. It doesn't show any error as well.


    Code:
    Sub GetHTMLFiles ()
    Dim doc As MSHTML.HTMLDocument, d As DataObject, htm As String
        Set d = New DataObject
        Set doc = CreateObject("htmlfile")
        fpath = "C:\users\abhy\desktop\htmldatafiles\"
        fpath = "C:\users\abhy\Appdata\Local\Temp"
        f = Dir(fpath & "*.htm")
        Do While f <> ""
            Open fpath & f For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            doc.body.innerHTML = htm
            Set t = doc.getElementsByTagName("table")(0)
            d.Clear
            d.settext t.innerText
            d.PutInClipboard
            rw = Cells(Rows.Count, 1).End(xlUp).Row + 1
            If rw = 2 Then rw = 1
            Cells(rw, 1).PasteSpecial
            If rw = 1 Then   'fix headers
    
                Else  ' delete headers
                hrows = 12    ' may need adjustment
                Cells(rw, 1).Resize(hrows).EntireRow.Delete
            End If
            f = Dir
        Loop
    End Sub
    Attached Images Attached Images  
    Last edited by abhay_547; Feb 11th, 2018 at 05:00 AM.

  14. #14
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    but when I run the code nothing happens. It doesn't show any error as well.
    it was definitely working when i tested, imported your sample and someother htm files, without problem

    does f get a value?
    you have 2 file paths, is the 2nd one correct?
    do the other variables get a correct value as the code runs through them?
    step through the code or msgbox the variables to check the values are correct

    the results should be pasted to the active worksheet, if there is any existing data the results should be below that
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  15. #15

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. I have commented out the below temp folder fpath in the code and it worked but it imports everything into one column. Attached is the htm sample file and the excel file with macro and imported text for your reference. can you please have a look.

    Code:
    Sub GetHTMLFiles ()
    Dim doc As MSHTML.HTMLDocument, d As DataObject, htm As String
        Set d = New DataObject
        Set doc = CreateObject("htmlfile")
        fpath = "C:\users\abhy\desktop\htmldatafiles\"
        'fpath = "C:\users\abhy\Appdata\Local\Temp"
        f = Dir(fpath & "*.htm")
        Do While f <> ""
            Open fpath & f For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            doc.body.innerHTML = htm
            Set t = doc.getElementsByTagName("table")(0)
            d.Clear
            d.settext t.innerText
            d.PutInClipboard
            rw = Cells(Rows.Count, 1).End(xlUp).Row + 1
            If rw = 2 Then rw = 1
            Cells(rw, 1).PasteSpecial
            If rw = 1 Then   'fix headers
    
                Else  ' delete headers
                hrows = 12    ' may need adjustment
                Cells(rw, 1).Resize(hrows).EntireRow.Delete
            End If
            f = Dir
        Loop
    End Sub
    Attached Files Attached Files

  16. #16
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    but it imports everything into one column.
    on testing, seems like this happens in later versions, should be easy enough to fix, will post later
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  17. #17

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. just to confirm the version again, I am using excel 2010 in office and my personal pc I got excel 2013. I tested in both and the output is coming in one column

  18. #18

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    hi westconn1, any luck with the later excel version fix. thanks.

  19. #19
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    this is tested in excel 2013
    Code:
    Sub GetHTMLFiles()
    Dim doc As MSHTML.HTMLDocument, d As DataObject, htm As String
        Set d = New DataObject
        Set doc = CreateObject("htmlfile")
        fpath = "C:\users\abhy\desktop\htmldatafiles\"
        fpath = "C:\Temp\"
        f = Dir(fpath & "html extract.htm")
        Do While f <> ""
            Open fpath & f For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            doc.body.innerHTML = htm
            Set t = doc.getElementsByTagName("table")(0)
            d.Clear
            d.settext t.innerText
            d.PutInClipboard
            rw = Cells(Rows.Count, 1).End(xlUp).Row + 1
            If rw = 2 Then rw = 1
            Cells(rw, 1).PasteSpecial
            Dim r As Range
           If rw = 1 Then   'fix headers
                Cells(1, 1).Resize(10).EntireRow.Delete
                Range("a1:n1") = Array("", "", "Bs", "", "", "TRXN", "SALE", "SALE", "TRXN", "INFO", "", "OPP", "KLC", "ACCOUNT")
                Range("a2:n2") = Array("DESCRIPTION", "DATE", "Dc", "RESI", "SINGL", "PRICE", "PRICE", "AMOUNT", "REFF", "M/K", "SEF", "ORGZ", "TYP", "NUMBER")
                Cells(3, 1).Resize(Cells(Rows.Count, 1).End(xlUp).Row - 2).TextToColumns , xlDelimited, xlTextQualifierNone, , , , , True
                Else  ' delete headers
                hrows = 12    ' may need adjustment
                Cells(rw, 1).Resize(hrows).EntireRow.Delete
                Set r = Cells(rw, 1).Resize(Cells(Rows.Count, 1).End(xlUp).Row - rw)
                r.TextToColumns r, xlDelimited, xlTextQualifierNone, , , , , True
            End If
            f = Dir(fpath & "html extract.htm")
        Loop
    End Sub
    unlike last time I tested on this machine the paste did go across multiple columns, the same as previous tests on the older version of excel, I have no idea what has changed to make the difference in the results, the call to texttocolumns will not matter if the data is already in multiple columns

    you may want to remove the last 2 lines of data from each import, if it is the same as the sample file
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  20. #20

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. initially when i tried to run the above code, it didn't work but when I commented out the below line it worked, but in my folder i had only one sample htm file and it kept looping and importing the data from the same file. Ideally if it finds one file then it should import that and then stop but it keeps looping and importing the same file. can you advise what's going wrong, both the source path and temp folder path are declared as fpath probably that's what is creating the problem and when i comment out it works but then keeps importing the data from the same file again and again.

    Code:
        'fpath = "C:\Temp\" ' commented out this line

  21. #21
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    i had only one sample htm file and it kept looping and importing the data from the same file
    i did that deliberately for test as i also only had one sample file, just forgot to remove when posting, remembered after shutting down the computer

    change f = Dir(fpath & "html extract.htm") at the bottom of the loop to
    Code:
    f = Dir
    and the initial call to dir, to your file pattern as you had before, i changed that also as there were other html files in the folder, that were no use for testing
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  22. #22

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. Great. Thanks a lot for your help . It's getting imported but for the first column if there is space in the text then it will import the text into 2 or 3 columns for e.g. if the text in the first column is just CREDIT then it will import it into the Column A but if the text in the column A is CREDIT INTERNAL (INT) then it will import the CREDIT into Column A and INTERNAL in Column B and (INT) in column D this is the case only with the first column since it consists of the text with space in some pages for e.g. in 500+ htm files around 120 htm pages (randomly) would have the text "CREDIT INTERNAL (INT)" or "DEBIT INTERNAL (INT)" in the first column. apart from this for few page line items the data is not available in the date column so instead of keeping that cell blank it imports the next column data in it because of which the column order changes for around 12000 rows of data. Would it be possible to take care of such thing in the code itself or will have to do a post import formatting. Also i have added the code to delete blank rows in between and also the lines which says contiuned... but how do i add 2 conditions over there...e.g. "*continued*" and "*ENP*" how can i add this...


    Code:
    Sub GetHTMLFiles()
    Dim doc As MSHTML.HTMLDocument, d As DataObject, htm As String
        Set d = New DataObject
        Set doc = CreateObject("htmlfile")
        fpath = "C:\users\abhy\desktop\htmldatafiles\"
        f = Dir(fpath)
        Do While f <> ""
            Open fpath & f For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            doc.body.innerHTML = htm
            Set t = doc.getElementsByTagName("table")(0)
            d.Clear
            d.settext t.innerText
            d.PutInClipboard
            rw = Cells(Rows.Count, 1).End(xlUp).Row + 1
            If rw = 2 Then rw = 1
            Cells(rw, 1).PasteSpecial
            Dim r As Range
           If rw = 1 Then   'fix headers
                Cells(1, 1).Resize(10).EntireRow.Delete
                Range("a1:n1") = Array("", "", "Bs", "", "", "TRXN", "SALE", "SALE", "TRXN", "INFO", "", "OPP", "KLC", "ACCOUNT")
                Range("a2:n2") = Array("DESCRIPTION", "DATE", "Dc", "RESI", "SINGL", "PRICE", "PRICE", "AMOUNT", "REFF", "M/K", "SEF", "ORGZ", "TYP", "NUMBER")
                Cells(3, 1).Resize(Cells(Rows.Count, 1).End(xlUp).Row - 2).TextToColumns , xlDelimited, xlTextQualifierNone, , , , , True
                Else  ' delete headers
                hrows = 12    ' may need adjustment
                Cells(rw, 1).Resize(hrows).EntireRow.Delete
                Set r = Cells(rw, 1).Resize(Cells(Rows.Count, 1).End(xlUp).Row - rw)
                r.TextToColumns r, xlDelimited, xlTextQualifierNone, , , , , True
            End If
            f = Dir
        Loop
    
    On Error Resume Next
    Columns("A").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    
    With ActiveSheet
        .AutoFilterMode = False
        With Range("A1", Range("A" & Rows.Count).End(xlUp))
            .AutoFilter 1, "*continued*"       
            On Error Resume Next
            .Offset(1).SpecialCells(12).EntireRow.Delete
        End With
        .AutoFilterMode = False
    End With
    
    
    End Sub
    Last edited by abhay_547; Feb 14th, 2018 at 03:34 AM.

  23. #23
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    Would it be possible to take care of such thing in the code itself
    probably loop down the second column (or use find) for internal, then easy to fix, add the word internal to column a and delete the cell in column B

    i thought that the continued line was already take care of in the code i posted, but that assumed all the headers were the same, and if some file did not have a continued line then a row of data may have been deleted, hence the comment "may need adjustment"
    i only hade one sample file to work with that did not contain all possibilities of the data

    as far as the last line and blank line above, it would be easy to check for a blank line above the last line, then paste the new data over it, no need to delete
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  24. #24
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    i had another thought, if internal is the only word that causes the 1st column to break, then just replace that word in the original string
    add the 2nd line
    Code:
            Close 1
            htm = Replace(htm, " internal", "_internal", , , vbTextCompare)
            doc.body.innerHTML = htm
    i have no data to test with, so give it a try
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  25. #25

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Quote Originally Posted by westconn1 View Post
    i had another thought, if internal is the only word that causes the 1st column to break, then just replace that word in the original string
    add the 2nd line
    Code:
            Close 1
            htm = Replace(htm, " internal", "_internal", , , vbTextCompare)
            doc.body.innerHTML = htm
    i have no data to test with, so give it a try
    so basically, you are suggesting to find the text CREDIT and then replace it with CREDIT INTERNAL (INT) like below, am i right ?

    Code:
       Close 1
            htm = Replace(htm, " CREDIT", "CREDIT INTERNAL (INT)", , , vbTextCompare)
            doc.body.innerHTML = htm

  26. #26
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    no, i suggested finding " internal" and replacing the preceding space with an underscore so that it would be continuous with the word before it, being debit, credit or any other
    as mentioned, this would need testing

    your code, would just add more spaces, making more additional columns, you would end up with 4 columns rather than the current 2 columns
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  27. #27

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. Got it. I tested it and it seems to work. can you advise on the below 3 things as well.

    1) Right now it doesn't loop the files and import the data in the file order/sequence for e.g. I have files named as FILE1.HTM, FILE2.HTM...and so on which goes till FILE500.htm or even FILE1000.htm but when the import happens it doesn't import the data of FILE1.HTM first and then FILE2.HTM and so on, it doesn't follow the sequence. can we modify the code to import the data by looping the files in order/sequence.
    2) for the account column whereever the account number starts with Zero e.g. 05347875, it will truncate the zero and import the number only starting 5 where else in some cases even if the file number starts with zero but ends with some alphabets then it wouldn't truncate the starting zero for e.g. 0573232A3 will get imported as it is, I believe such account numbers are treated as text since they have a alphabet in it and hence they get imported fully as text where else if the account number doesn't have any alphabet in it then its treated as a number hence the first/zero is getting truncated. Can we set the account column as text so that it gets imported correctly all time.
    3) As mentioned in my earlier comment, for few page line items the data is not available in the date column so instead of keeping that cell blank it imports the next column data in the date column because of which the column order changes for around 12000 rows of data. Would it be possible to take care of such thing in the code itself.

  28. #28
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    instead of using dir to get the filename you could try like

    Code:
    x = 1
    fname = mypath & "file" & x & ".htm"
    do until fname = ""
    
        ''''all your code for each file
        x= x + 1
        fname = mypath & "file" & x & ".htm"
    loop
    if any files are missing from the number sequence, it will stop at that point

    Can we set the account column
    set column to text, has to be done before the data is put in the cell
    Code:
    range("d1").entirecolumn.numberformat = "@"
    change the D to whatever column you want


    As mentioned in my earlier comment, for few page line items the data is not available in the date column so instead of keeping that cell blank it imports the next column data in the date column
    as there are no columns at all in the html code, it is hard to change the way excel converts text to columns, possibly the easiest way would be, after import, to check where the last column is empty then insert the additional cell for those rows, but i am not sure how accurate that would be, and it would probably take quite some time to run, post some sample data of rows that do not have date and i will see if it is possible to fix the raw data, before importing, similar to the fix for internal
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  29. #29

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. I have made the changes as you suggested but now it keeps looping on the last file again and again and keeps importing its data, not sure what is going wrong since it doesn't throw any error. can you check and advise.


    Code:
    Sub GetHTMLFiles()
    Dim doc As MSHTML.HTMLDocument, d As DataObject, htm As String
        Set d = New DataObject
        Set doc = CreateObject("htmlfile")
        fpath = "C:\users\abhy\desktop\htmldatafiles\"
         x = 0
         fname = fpath & "FILE" & x & ".htm"
          Do until fname = ""
            Open fname For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            htm = Replace(htm, " CREDIT", "_CREDIT", , , vbTextCompare)
            doc.body.innerHTML = htm
            Set t = doc.getElementsByTagName("table")(0)
            d.Clear
            Range("J1").entirecolumn.numberformat = "@"
            d.settext t.innerText
            d.PutInClipboard
            rw = Cells(Rows.Count, 1).End(xlUp).Row + 1
            If rw = 2 Then rw = 1
            Cells(rw, 1).PasteSpecial
            Dim r As Range
           If rw = 1 Then   'fix headers
                Cells(1, 1).Resize(10).EntireRow.Delete
                Range("a1:n1") = Array("", "", "Bs", "", "", "TRXN", "SALE", "SALE", "TRXN", "INFO", "", "OPP", "KLC", "ACCOUNT")
                Range("a2:n2") = Array("DESCRIPTION", "DATE", "Dc", "RESI", "SINGL", "PRICE", "PRICE", "AMOUNT", "REFF", "M/K", "SEF", "ORGZ", "TYP", "NUMBER")
                Cells(3, 1).Resize(Cells(Rows.Count, 1).End(xlUp).Row - 2).TextToColumns , xlDelimited, xlTextQualifierNone, , , , , True
                Else  ' delete headers
                hrows = 12    ' may need adjustment
                Cells(rw, 1).Resize(hrows).EntireRow.Delete
                Set r = Cells(rw, 1).Resize(Cells(Rows.Count, 1).End(xlUp).Row - rw)
                r.TextToColumns r, xlDelimited, xlTextQualifierNone, , , , , True
            End If
            x= x + 1
        fname = fpath & "FILE" & x & ".htm"
    loop
    
    On Error Resume Next
    Columns("A").SpecialCells(xlCellTypeBlanks).EntireRow.Delete
    
    With ActiveSheet
        .AutoFilterMode = False
        With Range("A1", Range("A" & Rows.Count).End(xlUp))
            .AutoFilter 1, "*continued*"       
            On Error Resume Next
            .Offset(1).SpecialCells(12).EntireRow.Delete
        End With
        .AutoFilterMode = False
    End With
    
    
    End Sub

  30. #30
    PowerPoster
    Join Date
    Nov 2017
    Posts
    3,116

    Re: Import data from multiple html files into one excel worksheet

    I'm not following the suggestion of:

    Code:
    Do until fname = ""
    since, as coded, fname will never be empty. It may contain a path to a file that doesn't exist, but that's not what is being checked for.

    So, since fname is never empty, the Do-Loop will never terminate. And I'm not sure why no errors are thrown when it attempts to open and read from files that don't exist, but since you report that it keeps processing the data from the last file that exists, it looks like it is just plowing past those lines of code and re-processing the value stored in "htm", which would be the contents of the "last" file.

    The following lines of code might work better for you:

    Change the first few lines of code from:


    Code:
          Do until fname = ""
            Open fname For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
    to:

    Code:
          Do
            htm=""
            Open fname For Input As 1
            htm = Input(LOF(1), #1)
            Close 1
            If htm = "" Then Exit Do
    That will clear out the htm variable before each attempt to populate it from a file. If it encounters a filename that doesn't exist (which would mean you've reached the end of your series of files), then the htm variable will stay empty, and then the If statement will terminate the Do-Loop, which I think is what you are looking for.

  31. #31
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    my error

    should have been
    Code:
    x = 1
    fname = dir(mypath & "file" & x & ".htm")
    do until fname = ""
       Open mypath & fname For Input As 1
        ''''all your code for each file
        x= x + 1
        fname = dir(mypath & "file" & x & ".htm")
    loop
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  32. #32

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. got it. My folder consists of the htm files 0 to 653 .i.e. FILE0.htm, FILE1.htm and so on, if i don't want to import all the files which are there in the folder then I can just enter the start number of the file and end number of the file so that it imports the data from the start number file to end number file. for e.g. even though i have 653 + 0 = 657 files in the folder but i want to import only from FILE242.htm to FILE411.htm, to make this work, i have created a string for both startfilenum and endfilenum but it's not working, it again keeping looping the last file, what is going wrong.



    Code:
    Dim StartFileNum As String
    Dim EndFileNum As String
    
    StartFileNum = ThisWorkbook.Sheets("Sheet1").Range("D3").Value
    EndFileNum = ThisWorkbook.Sheets("Sheet1").Range("D4").Value
    
    x = StartFileNum
    fname = dir(mypath & "file" & x & ".htm")
    do until fname = "mypath & "file" & x & ".htm""
       Open mypath & fname For Input As 1
        ''''all your code for each file
        x= x + 1
        fname = dir(mypath & "file" & x & ".htm")
    loop

  33. #33
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    Code:
        x= x + 1
          if x > endfilenum then exit do
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  34. #34

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Ok. below is the change I made, so if I enter zero and Endfilenum as 653 then it doesn't import entire thing...may be something is going wrong, it works when I enter the page 242 to 411.

    Code:
    Dim StartFileNum As String
    Dim EndFileNum As String
    
    StartFileNum = ThisWorkbook.Sheets("Sheet1").Range("D3").Value
    EndFileNum = ThisWorkbook.Sheets("Sheet1").Range("D4").Value
    
    x = StartFileNum
    fname = dir(mypath & "file" & x & ".htm")
    do until fname = "mypath & "file" EndFilenum x & ".htm""
       Open mypath & fname For Input As 1
        ''''all your code for each file
        x= x + 1
          if x > endfilenum then exit do
        fname = dir(mypath & "file" & x & ".htm")
    loop

  35. #35
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    absolutely no idea

    what have you tried to diagnose the problem?
    i would try
    Code:
    Dim StartFileNum As String
    Dim EndFileNum As String
    
    StartFileNum = ThisWorkbook.Sheets("Sheet1").Range("D3").Value
    EndFileNum = ThisWorkbook.Sheets("Sheet1").Range("D4").Value
    
    x = StartFileNum
    fname = dir(mypath & "file" & x & ".htm")
    do until fname = "mypath & "file" EndFilenum x & ".htm""
      debug.print fname
        x= x + 1
          if x > endfilenum then exit do
        fname = dir(mypath & "file" & x & ".htm")
    loop
    this should just loop through and print the flenames, if any file in the numeric order is missing then the code will not process further files, as designed, as you now have starting and finishing points you could also try

    Code:
    for x = startfilenumber to endfilenumber
        fname = dir(mypath & "file" & x & ".htm")
        if len(fname) > 0 then
             debug.print fname
             '''' open file and do stuff
          else msgbox "file " & x & " not found
        end if
    next
    apart from the msgbox that could be removed, missing files will just be ignored
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  36. #36

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    ok. It works now as expected. Thanks a lot for all your guidance and help Really appreciate it.

    Only one thing which is still an issue is the date column coming as blank which imports the next data column inside date column due to which it changes the column order for certain number of rows. As suggested in your earlier comment, i am attaching herewith the sample file where the date column doesn't consist of data for certain rows (in few pages the date column is completely blank also). this seems to be the issue with few other columns as well in some pages e.g. RESI which comes blank for certain rows in certain number of pages (or completely blank in few pages, discovered by me while checking the data post import). I think once a fix is figured out for date column same logic can be applied for all may be if the space is treated as a character before import then it may import the character in the column.
    Attached Files Attached Files

  37. #37
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    you can test this to see if it helps with missing dates

    Code:
            Close 1
            htm = Replace(htm, " internal", "_internal", , , vbTextCompare)
            htm = Replace(htm, Space(9), "  ~  ")
            doc.body.innerHTML = htm
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  38. #38

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    Great. Its working as expected

    The only issue is that when post running the macro when I try to replace the "~" with Null it doesn't work. It says Microsoft Excel cannot find any data to replace.

    Another thing, while data is getting imported after the last column, can we input the file number for e.g. the macro has imported File256.htm then in column P can we enter number 256 or even the file256.htm text against each row which is imported from that file, just to identify the rows that which is imported from which file.
    Last edited by abhay_547; Feb 18th, 2018 at 01:02 PM.

  39. #39
    PowerPoster
    Join Date
    Dec 2004
    Posts
    25,618

    Re: Import data from multiple html files into one excel worksheet

    when I try to replace the "~" with Null
    i do not believe you can use any non-visible character, if you want it to represent a column

    i do not know which column the file name will go into, but you can try this,
    Code:
            Close 1
            htm = Replace(htm, " internal", "_internal", , , vbTextCompare)
            htm = Replace(htm, Space(9), "  ~  ")
            htm = replace(htm, "</SPAN", "   " & fname & "</SPAN")
            doc.body.innerHTML = htm
    you could always insert some additional columns, if needed, to the entire worksheet after all the files are imported

    i have always been trying to avoid looping through all the data line by line
    i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case.
    Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next

    dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part

    come back and mark your original post as resolved if your problem is fixed
    pete

  40. #40

    Thread Starter
    Hyperactive Member
    Join Date
    Sep 2009
    Posts
    295

    Re: Import data from multiple html files into one excel worksheet

    thanks. but what i mean is for e.g. if the date column has missing values for most of the rows, then instead of showing the character "~" I want to show "Null" text for those row items. and I am trying to do this post the macro run .i.e. a simple find and replace with "~" and "Null" but excel is not able to find the "~" to replace it with "Null" not sure why.

Page 1 of 2 12 LastLast

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width