Results 1 to 4 of 4

Thread: File Type Identification

  1. #1

    Thread Starter
    Frenzied Member sciguyryan's Avatar
    Join Date
    Sep 2003
    Location
    Wales
    Posts
    1,763

    File Type Identification

    Hey there guys!

    I have a question for anyone. Maybe some code already exists for this or a good algorithm is already written down somewhere.

    Basically what I'm trying to do is make an unknown file type identifier. I'm trying to find if there are any .NET implementations of a file-type matching algorithm out there but I can't seem to find one.

    If there are none then can anyone point me in the general direction of writing one? I'm looking for help with the project so if anyone else is interested in this let me know.

    Cheers
    My Blog.

    Ryan Jones.

  2. #2
    Frenzied Member
    Join Date
    Jul 2008
    Location
    Rep of Ireland
    Posts
    1,380

    Re: File Type Identification

    I don't understand what you mean?

    You would know the filetype by knowing the path: C:\MyFile.TYPE

    It would simply be a case of using a regex to read everything after the . and check it against a list of some sort.

  3. #3
    Super Moderator jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    111,221

    Re: File Type Identification

    I assume that you mean that you would read the data of a file and determine whether it's a Word document, a PDF document, an AutoCAD drawing, and HTML file, etc. For a start, you would have to know the binary format of all the file types you want to be able to identify. You would then have to read the bytes of the file and compare the format to each of the known file types. When you find a match, you've found a match. Your aim is fairly unrealistic unless you are prepared to study all those different binary formats and write code to identify them in an arbitrary set of bytes.
    Why is my data not saved to my database? | MSDN Data Walkthroughs
    VBForums Database Development FAQ
    My CodeBank Submissions: VB | C#
    My Blog: Data Among Multiple Forms (3 parts)
    Beginner Tutorials: VB | C# | SQL

  4. #4

    Thread Starter
    Frenzied Member sciguyryan's Avatar
    Join Date
    Sep 2003
    Location
    Wales
    Posts
    1,763

    Re: File Type Identification

    Quote Originally Posted by DeanMc View Post
    I don't understand what you mean?

    You would know the filetype by knowing the path: C:\MyFile.TYPE

    It would simply be a case of using a regex to read everything after the . and check it against a list of some sort.
    You sometimes find that file extensions are named incorrectly, intentionally or otherwise so that method of identification is not considered to be accurate.

    Quote Originally Posted by jmcilhinney View Post
    I assume that you mean that you would read the data of a file and determine whether it's a Word document, a PDF document, an AutoCAD drawing, and HTML file, etc. For a start, you would have to know the binary format of all the file types you want to be able to identify.
    Once I actual figure out a good algorithm I'll generate these automatically using a large and varied sample set for each filetype to give the best accuracy I can.

    Quote Originally Posted by jmcilhinney View Post
    You would then have to read the bytes of the file and compare the format to each of the known file types. When you find a match, you've found a match. Your aim is fairly unrealistic unless you are prepared to study all those different binary formats and write code to identify them in an arbitrary set of bytes.
    You are talking about header byte matching correct? I thought about that but it will need to used in conjunction with other methods since not all files use byte headers for identification. One of the most prominent being ISO disk image files.
    My Blog.

    Ryan Jones.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width