Results 1 to 5 of 5

Thread: Difficult File Compare

  1. #1

    Thread Starter
    Lively Member FantastichenEin's Avatar
    Join Date
    Mar 2000
    Location
    dairy
    Posts
    106
    Ok,
    I have two files (4mb each).
    These are data files and contain data as follows:

    A header
    10 - 12 Lines of data under the header

    This is repated through the file,
    I need to create a process to check the data (1 header and its data below) from one file with the same data from another.
    If the data is different (e.g Has a new line below header)

    Trouble is the data (Headers) are in different places throughout the file compared to the other file.

    Sorry if not explained well
    Any help appreciated greatly

    ****

  2. #2
    Lively Member Kersey's Avatar
    Join Date
    Jun 1999
    Location
    The Netherlands
    Posts
    101

    more info requested...

    are you already able to open both files ?

    are the headers unique ?


  3. #3

    Thread Starter
    Lively Member FantastichenEin's Avatar
    Join Date
    Mar 2000
    Location
    dairy
    Posts
    106

    Info

    Yup, Can open both files.
    It needs to be memory efficient so I can load the files into arrays.
    Here is sample data.

    Code:
    0101,FR000000016447O0006       025,"CARREFOUR P3M PERP TSR B      ", 25, 203,    5000.00,"FRF",     762.24509,"EUR","EUR",     762.24509,"EUR",19890727,        ,20081215,"              ",    0.000,  12.00, 2, 15,   0.25,   1.00,    0.000,    0.000,20090727, 2,512, 1,     15.245,20090727,     762.24509, 1,   1.00,"FR0000164477", 3,19890727,  5.246801, 6,020800,"CARREFOUR                     ",
    0301,FR000000016447O0006       025,20000825,  0.401000,
    1001,FR000000016447O0006       025, 12,20010727,     39.99348,     39.99348,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 13,20020727,     40.25416,     40.25416,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 14,20030727,     40.25416,     40.25416,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 15,20040727,     40.25416,     40.25416,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 16,20050727,     49.78223,     49.78223,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 17,20060727,     49.78223,     49.78223,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 18,20070727,     49.78223,     49.78223,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 19,20080727,     49.78223,     49.78223,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016447O0006       025, 20,20090727,     53.59345,     53.59345,     762.24509,          20000, 100.0000, 100.0000,
    0101,FR000000016448O0000       025,"CARREFOUR P3M PERP TSR C      ", 25, 203,    5000.00,"FRF",     762.24509,"EUR","EUR",     762.24509,"EUR",19890727,        ,        ,"              ",    0.000,  12.00, 1, 15,   0.30,   1.00,    0.000,    0.000,20390727, 2,545, 1,     15.245,        ,       0.00000, 0,   0.00,"FR0000164485", 0,19890727,  5.296800, 6,020800,"CARREFOUR                     ",
    0301,FR000000016448O0000       025,20000825,  0.405000,
    1001,FR000000016448O0000       025, 12,20010727,     40.37460,     40.37460,       0.00000,          20000, 100.0000,   0.0000,
    1001,FR000000016448O0000       025, 13,2002072...........
    This data is recieved every day and the recived file will be compare with the previous days file. If there are any changes to the data then these records are written to a new file.


    Ok
    0101 Is the Header (never changes) however the position of the headers (line number) is often different. If any of the 1001 data if different the whole record mus be written to the new file.

    If you need anymore info just ask
    ****

  4. #4
    transcendental analytic kedaman's Avatar
    Join Date
    Mar 2000
    Location
    0x002F2EA8
    Posts
    7,221
    Ok i know nothing about your file format, so i could start with asking you how the header looks like, i think it decides the data size? And all headers must be the same size. The size should be specified in bytes, not lines

    Then you can open the file in binary and read the headers as UDT's and then the next data section into a variable length string for instance. Then you compare the strings with = if you want binary comparation or you can use like operator for noncasesensitive comparation... of course you don't have to do this at all if the size specified in the header are different. Then you just loop trough the whole file (btw do you have an overall header to the whole file in which yo specify the amount of records?
    Use
    writing software in C++ is like driving rivets into steel beam with a toothpick.
    writing haskell makes your life easier:
    reverse (p (6*9)) where p x|x==0=""|True=chr (48+z): p y where (y,z)=divMod x 13
    To throw away OOP for low level languages is myopia, to keep OOP is hyperopia. To throw away OOP for a high level language is insight.

  5. #5

    Thread Starter
    Lively Member FantastichenEin's Avatar
    Join Date
    Mar 2000
    Location
    dairy
    Posts
    106

    Thanks

    Just a quick thanks,
    I managed to solve the problem
    Cheers
    ****

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width