Reading Excel without Excel
I have a VB.NET app which pulls some data out of an Excel file and puts it into XML.
Originally, I read from the Excel file using an OleDB connection but then when we deployed the app to the test server, it gave an error saying that some Jet database components needed to be registered on the server to read from Excel and we're not allowed to register those on the server.
Then, I rewrote it using the Microsoft.Office.Interop Excel components. Apparently, however, that needs to have Excel registered on the server in order to work and we can't install that either.
Is there another way to read data from Excel using VB.NET which doesn't require anything but the .NET Framework on the server the app is running from? It's just a few simple rows of data with no formulas or macros or anything and I just want to grab the data out. It seems to me that this shouldn't be all that complex.
Re: Reading Excel without Excel
Why can't you install Excel?
Re: Reading Excel without Excel
As for Oledb, just download the Oledb provider and install it.
Re: Reading Excel without Excel
using the oledb method with Jet, your target cpu needs to be set to x86
Re: Reading Excel without Excel
Ya, they're not letting us install anything on the server for byzantian security reasons which don't make anymore sense after they're explained. That means that I can't do the Jet registration on there to use the OleDB and I can't put Excel on there to use the Interop.
Is there any other kind of Excel component which allows reading of the data (no updates or anything are necessary) which is a self-contained thing that can just be shipped with the application in a dll or the like?
Re: Reading Excel without Excel
Quote:
Originally Posted by
Tom Sawyer
Ya, they're not letting us install anything on the server for byzantian security reasons which don't make anymore sense after they're explained. That means that I can't do the Jet registration on there to use the OleDB and I can't put Excel on there to use the Interop.
Is there any other kind of Excel component which allows reading of the data (no updates or anything are necessary) which is a self-contained thing that can just be shipped with the application in a dll or the like?
I'm not 100% familiar with this... but I don't think so.
I know that SQL Server Management Studio manipulates Excel files without needing Excel installed. But maybe it installs the providers when installing itself.
*Edit:
I just Googled it and found this. I haven't looked much into it, but maybe you can change the code to your needs.
Re: Reading Excel without Excel
Is there any way you could convert your Excel file before it gets to the Server? Save it as a .CSV maybe and read it into VB as a text file?
Re: Reading Excel without Excel
Interesting option Chris. Is there a way to read that excel file back into vb without knowing the file name exactly?
Re: Reading Excel without Excel
You can read data from MS-Excel 2007 using Open XML SDK but not prior versions.
The following link has a demo solution which shows how to read Excel sheets, ranges and defined names. I will warn you it is not simple but if you truly want to get away from automation and OleDb this is worth it outside of using a third party library such as Aspose cells (which I use).
MSDN http://msdn.microsoft.com/en-us/library/bb448854.aspx
HOW TO http://msdn.microsoft.com/en-us/library/cc850837.aspx
Code bank article http://www.vbforums.com/showthread.php?t=664975
Re: Reading Excel without Excel
If the Excel file is saved as a .csv and is either the only .csv file in that location or you know a unique part of its name or available meta data (such as date created) then you can read it into VB using thr GetFiles method and If/Secect statements.
Roughly, if the file has a unique feature that you can get a handle on then, yes, you can read it into VB without knowing its name.
Re: Reading Excel without Excel
I see. Sorry to hijack another's thread for this but how do you capture the date created when the .csv is exported, say, from a web page?
That will be the only thing I could attest to that will be unique in the case I am working with, unfortunately.
Unless you know of a way to make it save to a specific file name using a URL. :P
Re: Reading Excel without Excel
I'm not sure how you'd capture the date created that way but, if the directory you're using on the server to store the csv in is only used for that purpose (or will not ever contain csv files used for another purpose), surely the csv file you will want will be the newest one?
As for the other option (save using URL) I have no idea. How are you creating and saving the csv now?
Re: Reading Excel without Excel
Since an Excel file is just a zipped folder you should be able to unzip it & read the sheet1.xml file that is contained within. I've never done it but it seems like it might be easier than some of the other options on the board...
Re: Reading Excel without Excel
Quote:
Originally Posted by
nbrege
Since an Excel file is just a zipped folder you should be able to unzip it & read the sheet1.xml file that is contained within. I've never done it but it seems like it might be easier than some of the other options on the board...
You are correct that an Office Excel 2007 file structure can be read as a zip file containing folders and files. Unfortunately, reading and writing to one of these files is not a simple task when compared to using OleDb or early binding automation. What is critical to know before working with these files is that you should be using Microsoft’s Open XML SDK library (my examples), which I mentioned in my reply earlier. Secondly when working with the XML structure and data that if a worksheet or cell value does not exists it will not be stored in the file. For instance using methods to read cells if row 1 has values through say E1 then you will be able to read the values then we move to row 2 where we have values in A2, nothing in B2, values in C2, D2 and E2 that C2 becomes D2 and D2 becomes E2 unless you have a method that indicates the above so that when returning values say into a DataTable to display we get exactly what you expected.
Here are some example of working with Open XML
Get column names from a specific worksheet
Code:
Public Function GetSheetColumnNames( _
ByVal FileName As String, _
ByVal RelationshipId As String, _
ByVal HasHeader As Boolean) As DataTable
Dim dt As New DataTable()
If SheetContainsData(FileName, RelationshipId) = ReasonCodes.FailedToOpen Then
'My.Dialogs.ExceptionDialog("Failed to open sheet because it is open by another process.")
Return dt
End If
Using spreadSheetDocument As SpreadsheetDocument = spreadSheetDocument.Open(FileName, False)
Dim WorkBookPart As WorkbookPart = spreadSheetDocument.WorkbookPart
Dim Sheets As IEnumerable(Of Sheet) = spreadSheetDocument.WorkbookPart _
.Workbook.GetFirstChild(Of Sheets)().Elements(Of Sheet)()
Dim WorksheetPart As WorksheetPart = CType(spreadSheetDocument.WorkbookPart _
.GetPartById(RelationshipId), WorksheetPart)
Dim WorkSheet As Worksheet = WorksheetPart.Worksheet
Dim SheetData As SheetData = WorkSheet.GetFirstChild(Of SheetData)()
Dim Rows As IEnumerable(Of Row) = SheetData.Descendants(Of Row)()
If Rows.Count = 0 Then
Return dt
End If
Dim HeaderIndex As Integer = 1
Dim UseData As Boolean = True
If HasHeader Then
For Each cell As Cell In Rows.ElementAt(0)
dt.Columns.Add(TryGetCellValue(spreadSheetDocument, cell, UseData), GetType(System.String))
' NEED Cell index
Next cell
Else
For Each cell As Cell In Rows.ElementAt(0)
dt.Columns.Add("Column" & HeaderIndex.ToString, GetType(System.String))
HeaderIndex += 1
Next cell
End If
End Using
Return dt
End Function
Code to read a sheet which accounts for cells which are not stored physically in the file as mentioned above. Lastly I should mention that this code targets a specific format within a specific file which may fail with another format which is no different than working with OleDb methods in regards to how extended properties are set up i.e. IMEX, header no header etc.
Code:
''' <summary>
''' Open a sheet
''' </summary>
''' <param name="FileName">Filename containing sheet</param>
''' <param name="dt">DataTable to populate</param>
''' <param name="Marker">Column count</param>
''' <param name="SheetName">Sheet name via RelationshipId</param>
''' <remarks></remarks>
Public Function PopulateSheet( _
ByVal FileName As String, _
ByRef dt As DataTable, _
ByVal Marker As Integer, _
ByVal SheetName As String) As String
Using Document = SpreadsheetDocument.Open(FileName, False)
Dim WorkBookPart = Document.WorkbookPart
Dim WorkBook = WorkBookPart.Workbook
Dim Sheets = WorkBook.Descendants(Of Sheet)()
Dim SheetInfo = (From T In Sheets Where T.Name = SheetName).FirstOrDefault
Dim WorkSheetPart = CType(WorkBookPart.GetPartById(SheetInfo.Id), WorksheetPart)
Dim SharedStringPart = WorkBookPart.SharedStringTablePart
Dim Values = SharedStringPart.SharedStringTable.Elements(Of SharedStringItem)().ToArray()
Dim Cells = WorkSheetPart.Worksheet.Descendants(Of Cell)()
Dim ColumnIndex As Integer = 0
Dim Row As DataRow = Nothing
Dim UseData As Boolean = False
For Each cell As Cell In Cells
' Gets the column index of the cell with data
Dim cellColumnIndex As Integer = CInt(Fix(GetColumnIndexFromName(GetColumnName(cell.CellReference))))
If cellColumnIndex = 0 Then
Row = dt.NewRow
End If
If ColumnIndex < cellColumnIndex Then
Do
ColumnIndex += 1
Loop While ColumnIndex < cellColumnIndex
End If
If cellColumnIndex <= Marker Then
Row(cellColumnIndex) = TryGetCellValue(Document, cell, UseData)
End If
If Marker = 1 Then
If UseData Then
dt.Rows.Add(Row)
End If
ElseIf cellColumnIndex >= Marker Then
dt.Rows.Add(Row)
End If
ColumnIndex += 1
Next
End Using
End Function
Using Open XML is but one tool in a toolbox and when used for the right reason is great but for the wrong reason can be a mess.
Reading say 20,000 rows by five columns with OleDb is less than five seconds while using standard Open XML SDK it might take 30 to 60 seconds reading in all rows and cells. Of course with optimization this can be faster yet that is more code.