[RESOLVED] Copying specific values from a .txt file
Hey, I've attached my macro as it is now.
The thing I want to do is search through a folder full of .txt files (with different content) and extract some information. I can manage all of it, but it does not feel very robust.
I want a more dynamic/easier way of extracting (if there is one) the info I want, its a mixture of textinfo and numbers. And I want it reported in an excel.docx. Lets say four pieces of info from the text file, reported in a range(cells(1,1),cells(4,2)) area, with a reference in coloumn 1, and the extracted info in coloumn 2.
Main question: In what way can I easily identify the parts I want to extract from the text files? The info I want might not be in the same place in every file, which makes my approach fail.
All help appreciated!
Cheers!
Code below, and attached.
Code:
Sub AllWorkbooks()
Dim MyFolder As String 'Path collected from the folder picker dialog
Dim MyFile As String 'Filename obtained by DIR function
Dim wbk As Workbook 'Used to loop through each workbook
On Error Resume Next
Application.ScreenUpdating = False
'Opens the folder picker dialog to allow user selection
With Application.FileDialog(msoFileDialogFolderPicker)
.Title = "Please select a folder"
.Show
.AllowMultiSelect = False
If .SelectedItems.Count = 0 Then 'If no folder is selected, abort
MsgBox "You did not select a folder"
Exit Sub
End If
MyFolder = .SelectedItems(1) & "\" 'Assign selected folder to MyFolder
End With
MyFile = Dir(MyFolder) 'DIR gets the first file of the folder
'Loop through all files in a folder until DIR cannot find anymore
y = 1
Do While MyFile <> “”
'Opens the file and assigns to the wbk variable for future use
Set wbk = Workbooks.Open(Filename:=MyFolder & MyFile)
'######### THIS IS WHAT i WANT TO IMPROVE ############################
Sheets(1).Cells(2, 1).Copy 'copys the selection I'm after
Workbooks("åpne tekstfiler.xlsm").Activate 'opens any sheet you want to store the data
Sheets(1).Cells(y, 2).Select 'choose cell to paste
ActiveSheet.Paste
fx = "B" & y
formel1 = "=MID(" & fx & ",33,10)"
Cells(y, 1).Select
Cells(y, 1).Formula = formel1
Dim myval
myval = Val(Cells(y, 1))
y = y + 1
wbk.Close savechanges:=True
MyFile = Dir 'DIR gets the next file in the folder
Loop
'the numbers I get from the formula gives an error naming it as text or appostrophe in it... I want it to be numbers, so I do the below to fix (bettersolution?)
Range("a:a").Copy
Range("a:a").PasteSpecial xlPasteValues
Application.CutCopyMode = False
Range("J1:J17").Formula = "=Value(A1)" 'formatted as text, need to format as number
Range("j:j").Copy
Range("j:j").PasteSpecial xlPasteValues
Application.CutCopyMode = False
Range("a:a").Value = Range("j:j").Value
Range("j:j").Clear
'######################################################################################
Application.ScreenUpdating = True
End Sub
Set wbk = Workbooks.Open(Filename:=MyFolder & MyFile)
if these are text files why open as workbooks?
Code:
'######### THIS IS WHAT i WANT TO IMPROVE ############################
wbk.Sheets(1).Cells(2, 1).Copy 'copys the selection I'm after
'Workbooks("åpne tekstfiler.xlsm").Activate 'opens any sheet you want to store the data
with mainsht
.Cells(y, 2).pastespecial
fx = "B" & y
formel1 = "=MID(" & fx & ",33,10)"
'Cells(y, 1).Select
.Cells(y, 1).Formula = formel1
end with
Dim myval
myval = Val(Cells(y, 1)) ' i can not see what this is ever used for
y = y + 1
wbk.Close savechanges:=True 'why save changes, nothing is changed in the file?
before the loop set a worksheet object for the target workbook like set mainsht = Workbooks("åpne tekstfiler.xlsm").sheets(1)
try change the formula to formel1 = "=value(MID(" & fx & ",33,10))"
i do my best to test code works before i post it, but sometimes am unable to do so for some reason, and usually say so if this is the case. Note code snippets posted are just that and do not include error handling that is required in real world applications, but avoid On Error Resume Next
dim all variables as required as often i have done so elsewhere in my code but only posted the relevant part
come back and mark your original post as resolved if your problem is fixed
pete
Main question: In what way can I easily identify the parts I want to extract from the text files? The info I want might not be in the same place in every file, which makes my approach fail.
Is there any identifying information stored in the rows you want to pull data from? You will need something to search for so that that needed information can be extracted. If possible, post a sample of your text file (you do not need to post all of it, just a sample that shows needed rows and unneeded rows).
In the attached text file, I will want this information extracted and sorted "nicely" in excel for further use.
The Underlined information is what I want extracted:
as an example from the first part of the text file.
Handelsbanken INVOICE Page 1 / 1 NORDIC CUSTODY SERVICES OSLO
Minimum fee NOK 1 1*000.00 1*000.00
Total NOK 1*000.00
Please be informed that we will debit your account 83960235013on expiry date 7.7.
third "part" of the text file.
Svenska Handelsbanken AB Svenska Handelsbanken Telephone
Registered office: Stockholming Business ID: NO 971 17 Fax 22 39 70
Registered in the bank register Postboks 1342 VIKA +47 22 39 72
Swedish Financial Supervisory A N-0113 Oslo SWIFT
Business organisation number: 5 Norway HANDNOKK
Average value Basis point Fee Safekeeping fee 1 804 301 (0.0050%) 7*414.94
7*414.94 Transaction fee
#528 NOK 35. 18*480.00
18*480.00 Transaction fee Repair fees #13 NOK 50. 650.00
650.00 Total NOK 26*544.94
Looks quite messy when I put it inhere, but there is a system to it, if you look in the .txt file its easier to get an overview.
All I'm looking for is a way to extract the data directly from the text file, Searching it for the different "fee's" and I need a name.
If necessary for the sorting you might have to split the different "parts" of the textfile?
Is your attached file a single file that needs to be processed or is it an example of three different file formats where the real file would be one of the shown formats.
I am assuming that it represents three different formats.
Last edited by TnTinMN; Jul 7th, 2014 at 03:35 PM.
Working on the assumption that there are three file formats ( the assumed file types are defined in the comments), I created the below code to read the text file into a Collection where each item in the collection is a line from the file. The code determines the file type.
I am not going to write all the code and expect that you can write the code needed to find and extract the needed information based on knowing the file type and by searching the lines for key phases that identify the line that contains the phrase.
Searching and line parsing is a basic staple file processing and should not be that difficult for you. If you need assistance understanding what I wrote, feel free to ask. The main method in the below code is the "Test" subroutine.
VBA Code:
Option Explicit
Public Enum SourceType
TypeI = 0
TypeIIA = 1
TypeIIB = 2
End Enum
Sub Test()
Dim numLines As Integer
Dim lines As Collection
Set lines = ReadAllLines(filePath:="path to file", trimLines:=True)
If lines.count > 0 Then
Dim fileType As SourceType
fileType = DetermineType(lines)
Select Case fileType
Case SourceType.TypeI
' Stop
'process this file
Case SourceType.TypeIIA
' Stop
'process this file
Case SourceType.TypeIIB
' Stop
'process this file
End Select
End If 'lines.count > 0
End Sub
Private Function ReadAllLines(filePath As String, Optional trimLines As Boolean = False) As Collection
Dim fso As Object
Dim txtStream As Object
Const IOMode_ForReading As Integer = 1
Const TriState_False As Integer = 0
Set fso = CreateObject("Scripting.FileSystemObject")
Set txtStream = fso.OpenTextFile(filePath, IOMode_ForReading, False, TriState_False)
Dim numLines As Integer
Dim lines As New Collection
Do While Not txtStream.AtEndOfStream
If trimLines Then
lines.Add (trim$(txtStream.ReadLine))
Else
lines.Add (txtStream.ReadLine)
End If
Loop
Set ReadAllLines = lines
End Function
Private Function DetermineType(source As Collection) As SourceType
' Type I:
' No real unique text.
' What will decide if a file is TypeI is it is TypeI if it is not a TypeII file.
' It will contain this block similar to TypeIIA
'Minimum fee NOK 1 1*000.00
' 1*000.00
' Total NOK
' 1*000.00
' Type II
' All TypeII files contain a block of lines like this:
'Svenska Handelsbanken AB Svenska Handelsbanken Telephone
'Registered office: Stockholming Business ID: NO 971 17 Fax 22 39 70
'Registered in the bank register Postboks 1342 VIKA +47 22 39 72
'Swedish Financial Supervisory A N-0113 Oslo SWIFT
Private Function StartofTypeIIBlock(source As Collection) As Integer
Dim ret As Integer
ret = -1 ' Default not found
' All TypeII files contain a block of lines like this:
'Svenska Handelsbanken AB Svenska Handelsbanken Telephone
'Registered office: Stockholming Business ID: NO 971 17 Fax 22 39 70
'Registered in the bank register Postboks 1342 VIKA +47 22 39 72
'Swedish Financial Supervisory A N-0113 Oslo SWIFT
'Business organisation number: 5 Norway HANDNOKK
' find the line starting with: "Business organisation number:"
' assume this string only exists in this file type and will be at the start of a line.
Dim line As String
Dim i As Integer
Dim positions() As Integer
If source.count >= 5 Then ' should be at least 5 lines
For i = 5 To source.count
line = source.Item(i)
If Contains(line, "Business organisation number:", positions) Then
If positions(0) = 1 Then
ret = i - 4 ' 4 preceeding lines to start of block.
End If
End If
If ret <> -1 Then Exit For
Next i
End If
StartofTypeIIBlock = ret
End Function
Private Function StartOfMinimumFeeBlock(source As Collection, Optional startAtLine As Integer = 1) As Integer
Dim ret As Integer
ret = -1
'Minimum fee NOK 1 1*000.00
' 1*000.00
' Total NOK
' 1*000.00
Dim line As String
Dim i As Integer
Dim positions() As Integer
If source.count >= startAtLine Then
For i = 5 To source.count
line = source.Item(i)
If Contains(line, "Minimum fee ", positions) Then
If positions(0) = 1 And UBound(positions) = 0 Then
' check for "NOK 1"
Dim tmp() As Integer
If Contains(line, "NOK 1", tmp) Then
If UBound(tmp) = 0 Then ' probably have it
' check for 3rd line & 4th lines in block
If source.count > (i + 3) Then
If Contains(source.Item(i + 2), "Total NOK", tmp) Then ' assume found it
ret = i
End If
End If 'source.count > (i + 3)
End If 'UBound(tmp) = 0
End If 'Contains(line, "NOK 1", tmp)
End If ' positions(0) = 1 And UBound(posistions) = 0
End If 'Contains(line, "Minimum fee ", positions)
If ret <> -1 Then Exit For
Next i
End If ' source.count >= startAtLine
StartOfMinimumFeeBlock = ret
End Function
Public Function Contains(searchString As String, findString As String, positions() As Integer, Optional compare As VbCompareMethod = VbCompareMethod.vbTextCompare) As Boolean
' positions will be filled with the start position of each occurence of findString in searchString
They are given in one big text file i'm afraid, but, it should be possible to divide it into many smaller text files? They are divided within the big file with 21 blank lines (more often then not, atleast a noticable/programmable space) so just do an
Code:
for 1 to endlines
IF count(lines = "") > 15 then
cut above lines, and store in new file,
next
or something?
or just loop through the bigger file in some way?
The formats are the ones you have chosen, might be some discrepancies but those I can amend on my own, I like your approach, like many things in VBA this one is new to me. Lovely to learn!
However, I am struggeling to get a grasp of the code/functions you have made. How much (other then filepath) will I have to customize?
The functions are used to determine the filetypes, or rather if a "block? then I have to customize
Code:
Select Case fileType
Case SourceType.TypeI
' Stop
'process this file
Case SourceType.TypeIIA
' Stop
'process this file
Case SourceType.TypeIIB
' Stop
'process this file
End Select
End If 'lines.count > 0
to dertermine what is done with this block, the " 'process this file" part?
is this the correct interpretation?
Last edited by Arithos; Jul 8th, 2014 at 05:50 AM.
They are given in one big text file i'm afraid, but, it should be possible to divide it into many smaller text files? They are divided within the big file with 21 blank lines (more often then not, atleast a noticable/programmable space) so just do an
Ok, I will need to re-think this a bit. Also, I just realized that I misinterpreted the file.
The block
HTML Code:
Svenska Handelsbanken AB Svenska Handelsbanken Telephone
Registered office: Stockholming Business ID: NO 971 17 Fax 22 39 70
Registered in the bank register Postboks 1342 VIKA +47 22 39 72
Swedish Financial Supervisory A N-0113 Oslo SWIFT
Business organisation number: 5 Norway HANDNOKK
appears at the bottom of the the first three pages, but is omitted on the fourth page. I originally thought that it was the start of a page.
I guess the formfeed character after it should have given me a clue.
Here is a pdf of the file to illustrate the pagination based on the formfeed characters. toforum.pdf
Will there always be four pages laid-out in this order?
There will be about 1200 lines in a file, so alot of them =) but they layout is the same for each page. Assume there are more, and they can vary, though, it should be possible to split one text file into many smaller ones yes? and store them in a folder and then go into each one an extract the information.. if that make it easier
I've marked again the info I want to extract in a pdf toforum .pdf
There will be about 1200 lines in a file, so alot of them =) but they layout is the same for each page. Assume there are more, and they can vary, though, it should be possible to split one text file into many smaller ones yes? and store them in a folder and then go into each one an extract the information.. if that make it easier
I've marked again the info I want to extract in a pdf
Assuming that each page is terminated by a formfeed character (except last page), then breaking the file appart by pages is pretty easy.
After reviewing your highlighted pdf, I have identified these items that must be extracted from each page (I assume that you just forgot to highlight the customer name on the last page).
Customer Name
One of these cases
Minimum fee
Itemized Fee Data (Average Value and Fee subtotal value)
SafeKeeping Fee
Transaction Fee
Transaction Fee/Repair Fee
Account Number
Since you are doing this in Excel, is it safe to assume that the results from each page will represent a row (record) in a table?
I will take another look at this later and prototype it out.
Well originally I said would not write the entire parsing code, but based on my current understanding of the file format (much less tedious) I went ahead and did it anyways. The code currently store the records in an array, you should modify that to write to a worksheet.
I tried to comment as much as possible so hopefully you will be able to follow the logic. It may seem like there is a lot of functions, but this makes it easier to write by breaking it down into smaller parts that can be debugged on their own.
The entry point is the "test" subroutine.
VBA Code:
Option Explicit
' the following are User Defined Types that define a record
' they must be declared at the beginning of a Module
Public Function StartsWith(sourceString As String, findString As String, Optional compare As VbCompareMethod = VbCompareMethod.vbTextCompare) As Boolean
Public Function Contains(sourceString As String, findString As String, positions() As Integer, Optional compare As VbCompareMethod = VbCompareMethod.vbTextCompare) As Boolean
' positions will be filled with the start position of each occurence of findString in sourceString
Since I got a "Type mismatch" error, now the whole thing runs tho.
I can see the stored records at the end of the "Sub ExtractData", it only contains data from the first page tho. but as "line" its already captured the next pages "customer". after this I should "continue" in the test subroutine to store the data where I want it? then it will continue to the next page?
Per now it just ends if I run it, so there is Some customization I have to do, I'm just trying to get a decent understanding of how you've built up your code Its abit tricky to get a grasp of what you've done (love youre comments tho, VBA for dummies approach, I like
Q1: is records stored as a three-dimensional array?
(And yes, you hva the right understanding of the buildup of the data.)