Results 1 to 9 of 9

Thread: dividing output into seperate text files

  1. #1

    Thread Starter
    Member
    Join Date
    Jan 2007
    Posts
    62

    dividing output into seperate text files

    ok, situation looks like this:

    i have a rich text box filled with words, say, 30,000 words for instance all on different lines. i need to be able to press a button and divide this richtext1.text into output of 30 text files with 1,000 words in each of them

    i need it to work with any number of words, so say if there were 23.3k words in the text box it would make 24 files, and the fourth would have 300 words instead of 1000.

    ive been toying around but unable to figure it out, any help it greatly appreciated!

    thanks in advance, gur00z

  2. #2
    Frenzied Member
    Join Date
    Jun 2006
    Posts
    1,098

    Re: dividing output into seperate text files

    Do you need RichText, or is this simply a list of words?

  3. #3

    Thread Starter
    Member
    Join Date
    Jan 2007
    Posts
    62

    Re: dividing output into seperate text files

    im just using a rich text box because you can store more words in it than a regular textbox, but i need the contents of the richtextbox divided into .txt documents in increments of 1,000

  4. #4
    PowerPoster Ellis Dee's Avatar
    Join Date
    Mar 2007
    Location
    New England
    Posts
    3,530

    Re: dividing output into seperate text files

    Totally untested:
    Code:
    Private Type StringArrayType
        Words() As String
    End Type
    
    Private Sub Command1_Click()
        Dim strArray() As String
        Dim typFiles() As StringArrayType
        Dim i As Long
        Dim iMax As Long
        Dim lngWord As Long
        Dim lngFile As Long
        Dim lngFileMax As Long
        Dim lngLastSize As Long
        
        ' Split words into array
        strArray = Split(richtext1.Text, vbNewLine)
        iMax = UBound(strArray)
        ' Identify how many files will be needed
        lngFileMax = iMax \ 1000
        ReDim typFiles(lngFileMax)
        lngFile = -1
        ' How many words in last file?
        lngLastSize = (iMax + 1) Mod 1000
        If lngLastSize = 0 Then lngLastSize = 999
        ' Copy words to temporary file arrays
        For i = 0 To iMax
            If lngWord = 0 Then
                lngFile = lngFile + 1
                With typFiles(lngFile)
                    If lngFile = lngFileMax Then ReDim .Words(lngLastSize) Else ReDim .Words(999)
                End With
            End If
            typFiles(lngFile).Words(lngWord) = strArray(i)
            If lngWord = 999 Then lngWord = 0 Else lngWord = lngWord + 1
        Next
        ' Re-use strArray to hold joined word lists
        ReDim strArray(lngFileMax)
        For i = 0 To lngFileMax
            With typFiles(i)
                strArray(i) = Join(.Words, vbNewLine)
                Erase .Words
            End With
        Next
        Erase typFiles
        ' Create the files
        For i = 0 To lngFileMax
            SaveStringToFile App.Path & "\Words" & Format(i + 1, "00"), strArray(i)
        Next
        Erase strArray
    End Sub
    
    Public Sub SaveStringToFile(pstrFile As String, pstrText As String)
        Dim FileNumber As Long
        
        If Len(Dir(pstrFile)) <> 0 Then Kill pstrFile
        FileNumber = FreeFile
        Open pstrFile For Output As #FileNumber
        Print #FileNumber, pstrText
        Close #FileNumber
    End Sub

  5. #5
    Frenzied Member
    Join Date
    Jun 2006
    Posts
    1,098

    Re: dividing output into seperate text files

    Let's keep it simple:
    Code:
    Private Sub Command1_Click()
      Dim strText As String
      Dim lngFileCount As Long
      Dim lngPos As Long
      Dim lngCount As Long
      Dim lngStart As Long
      Dim fNum As Integer
      
      strText = RichTextBox1.Text ' Copy the text to temporary variable
      lngStart = 1 ' Begining of string
      lngPos = InStr(1, strText, vbNewLine) ' Find first line break
      Do Until lngPos = 0 ' Loop until all line breaks have been found
        lngCount = lngCount + 1 ' Count the number of line breaks
        If lngCount = 1000 Then ' If we have 1000 line breaks...
          fNum = FreeFile
          ' Open a text file, appending lngFileCount for unique filenames
          Open App.Path & "\Words" & Format(lngFileCount, "00") & ".txt" For Output As #fNum
          ' Print the text from lngStart up to the 1000th line break
          Print #fNum, Mid$(strText, lngStart, lngPos - lngStart);
          Close #fNum
          lngFileCount = lngFileCount + 1 ' Increment lngFileCount
          lngCount = 0 ' Reset count of line breaks
          lngStart = lngPos + 2 ' Start at begining of next line
        End If
        lngPos = InStr(lngPos + 2, strText, vbNewLine) ' Find next line break
      Loop
      fNum = FreeFile
      ' Output the remainder of the string, if there is anything left
      If lngStart <= Len(strText) Then
        Open App.Path & "\Words" & Format(lngFileCount, "00") & ".txt" For Output As #fNum
        Print #fNum, Mid$(strText, lngStart);
        Close #fNum
        lngFileCount = lngFileCount + 1
      End If
      MsgBox CStr(lngFileCount) & " files created"
    End Sub

  6. #6
    New Member
    Join Date
    Sep 2009
    Posts
    4

    Re: dividing output into seperate text files

    Logophobic, thanks for your code.

    I&#180;m trying to use for a different way but I have a problem,
    I&#180;m working with text files sizes, more or less 2, 4 gbs, so is impossible to load all the information at one time.

    So I need the new files be divide in the number of lines I want but written by memory blocks of 64k (example).


    Can any one help me?

    Regards,
    Last edited by anonimou; Nov 10th, 2009 at 06:52 PM.

  7. #7
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Question Re: dividing output into seperate text files

    Quote Originally Posted by anonimou View Post
    Logophobic, thanks for your code.

    I´m trying to use for a different way but I have a problem. I´m working with text files sizes, more or less 2, 4 gbs, so is impossible to load all the information at one time.

    So I need the new files be divide in the number of lines I want but written by memory blocks of 64k (example).

    Can any one help me?

    Regards,
    This is not going to be easy by any means. Both FileLen() and LOF() are limited in size by 2^31 characters.

    I believe you need to rethink the entire problem and figure out why you really need to do this precise file size splitting. Why do the the first N files from the break out have to be exactly the same size?
    Doctor Ed

  8. #8
    New Member
    Join Date
    Sep 2009
    Posts
    4

    Re: dividing output into seperate text files

    Hi Code Doc,

    The Files don&#180;t need to have exactly the same size.

    Example:
    I have a file with 4 Gbs of information and I want to split this file in 10 files of 400 mgbs.
    But I d&#243;n&#180;t have any problem if the last file who will receive what left of the original file
    close with 1mgb
    Or instead of 10 files the program split in 9 or 11 files.

    I olny need the split don&#180;t break any line information and pass what left of that line to the next file.


    Regards.

  9. #9
    PowerPoster dilettante's Avatar
    Join Date
    Feb 2006
    Posts
    24,487

    Re: dividing output into seperate text files

    Here is an attempt. You might find you need to fiddle with it some. I won't claim it is bug-free, but simple tests seem to show it working properly. As far as I know it never "abandons" any text from the end of the input file.

    It assumes three things: ANSI input file (not Unicode), lines end with CRLF or LF, and that within a given CHUNK_SIZE of characters a line break will occur. Otherwise it may well split in the middle of a line. It should always split at a "line" boundary otherwise, not within lines.

    On small files near DesiredPieces * CHUNK_SIZE in length it will probably produce fewer than DesiredPieces files. CHUNK_SIZE is 65536 unless you alter this. The "pieces" will be numbered like: Split001.txt, Split002.txt, etc. based on a supplied "pattern" for the output files of Split.txt.

    It should work for input files of up to a few hundred GB.

    Good luck!
    Attached Files Attached Files

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width