Results 1 to 27 of 27

Thread: Mass Storage Question

  1. #1

    Thread Starter
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Question Mass Storage Question

    Here's an interesting one--how many bytes of storage are required on a hard drive to store every single word ever written by humans in any language that humans have ever used for communication?

    Can you provide a reasonanly close estimate? Please advise.
    Doctor Ed

  2. #2
    Ex-Super Mod RobDog888's Avatar
    Join Date
    Apr 2001
    Location
    LA, Calif. Raiders #1 AKA:Gangsta Yoda™
    Posts
    60,710

    Re: Mass Storage Question

    Just one byte, a really really big bite!
    VB/Office Guru™ (AKA: Gangsta Yoda®)
    I dont answer coding questions via PM. Please post a thread in the appropriate forum.

    Microsoft MVP 2006-2011
    Office Development FAQ (C#, VB.NET, VB 6, VBA)
    Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
    If a post has helped you then Please Rate it!
    Reps & Rating PostsVS.NET on Vista Multiple .NET Framework Versions Office Primary Interop AssembliesVB/Office Guru™ Word SpellChecker™.NETVB/Office Guru™ Word SpellChecker™ VB6VB.NET Attributes Ex.Outlook Global Address ListAPI Viewer utility.NET API Viewer Utility
    System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6

  3. #3

    Thread Starter
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Re: Mass Storage Question

    Quote Originally Posted by RobDog888 View Post
    Just one byte, a really really big bite!
    Just one big byte?
    Gigabyte, Terabyte, Petabyte, or a larger byte?
    Doctor Ed

  4. #4
    coder. Lord Orwell's Avatar
    Join Date
    Feb 2001
    Location
    Elberfeld, IN
    Posts
    7,621

    Re: Mass Storage Question

    Quote Originally Posted by Code Doc View Post
    Here's an interesting one--how many bytes of storage are required on a hard drive to store every single word ever written by humans in any language that humans have ever used for communication?

    Can you provide a reasonanly close estimate? Please advise.
    You aren't providing enough information. Are you including merely published works, or are you also including all my email, my grocery list in my pocket, etc?
    plus, there's the fact there are over 4000 current spoken languages but not all of them even have written words, and some share. Mandarin and cantonese for example use the same written text. Do we include dialects here? you get my drift. Some estimating must be done. I seem to rmember someone once computing that an encyclopedia would fit on a floppy disk, so i am pretty sure a terabyte would be plenty big.
    Last edited by Lord Orwell; Jan 4th, 2011 at 09:20 PM.
    My light show youtube page (it's made the news) www.youtube.com/@artnet2twinkly
    Contact me on the socials www.facebook.com/lordorwell

  5. #5
    I'm about to be a PowerPoster!
    Join Date
    Jan 2005
    Location
    Everywhere
    Posts
    13,647

    Re: Mass Storage Question

    Quote Originally Posted by Code Doc View Post
    Can you provide a reasonanly close estimate?
    No..

  6. #6
    Next Of Kin baja_yu's Avatar
    Join Date
    Aug 2002
    Location
    /dev/root
    Posts
    5,989

    Re: Mass Storage Question

    Do we count only published (printed) material or any write/scribble in general? Do we count copies (in case of printed material do we count one book just once or once for every copy printed)?

  7. #7
    Fanatic Member
    Join Date
    Jun 2008
    Posts
    1,023

    Re: Mass Storage Question

    you're gonna need a hell load of space and time to fill it on a hard drive.

  8. #8

  9. #9

    Thread Starter
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Re: Mass Storage Question

    OK, I'll try to define the problem more accurately. Let's assume:

    (1) All different words written by all humans in any language. Matched words are to be ignored. Common slang words are acceptable.

    (2) Words composed of multiple words do not count. These must be separate words and not joined by hyphens or combined by inept text messagers or those text messagers trying to show off.

    (3) Trivial concocted abbreviations and combinations are not to be included, such as URDum, PITA, and TIA.

    Regardless of these restrictions, I have been told that the answer is many Petabytes, even with compression. I am having a hard time believing that.

    Now what would it take to store all words?
    Doctor Ed

  10. #10
    Ex-Super Mod RobDog888's Avatar
    Join Date
    Apr 2001
    Location
    LA, Calif. Raiders #1 AKA:Gangsta Yoda™
    Posts
    60,710

    Re: Mass Storage Question

    What about when new words are created or eveolve during your writting process to the hard drive? Will yoiu have an ever continuing process processing new words? This will be a never ending task so we can not tell you how much space

    VB/Office Guru™ (AKA: Gangsta Yoda®)
    I dont answer coding questions via PM. Please post a thread in the appropriate forum.

    Microsoft MVP 2006-2011
    Office Development FAQ (C#, VB.NET, VB 6, VBA)
    Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
    If a post has helped you then Please Rate it!
    Reps & Rating PostsVS.NET on Vista Multiple .NET Framework Versions Office Primary Interop AssembliesVB/Office Guru™ Word SpellChecker™.NETVB/Office Guru™ Word SpellChecker™ VB6VB.NET Attributes Ex.Outlook Global Address ListAPI Viewer utility.NET API Viewer Utility
    System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6

  11. #11
    I'm about to be a PowerPoster!
    Join Date
    Jan 2005
    Location
    Everywhere
    Posts
    13,647

    Re: Mass Storage Question

    Quote Originally Posted by Code Doc View Post
    (2) Words composed of multiple words do not count. These must be separate words and not joined by hyphens or combined by inept text messagers or those text messagers trying to show off.
    Most languages contain many words formed from other words, or words with prefixes and suffixes.

    What about languages which don't have words?

  12. #12
    I don't do your homework! opus's Avatar
    Join Date
    Jun 2000
    Location
    Good Old Europe
    Posts
    3,863

    Re: Mass Storage Question

    Quote Originally Posted by penagate View Post
    What about languages which don't have words?
    Which remiinds me, my wife had this strange look the other day, afterwards she blamed for not ..... Sounds like a language without words
    You're welcome to rate this post!
    If your problem is solved, please use the Mark thread as resolved button


    Wait, I'm too old to hurry!

  13. #13
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Mass Storage Question

    So just words? Not phrases, sentences, paragraphs.... Only unique words regardless of written language?

    According to Google, they have a nice collection of over 13 million unigrams(words). Assuming 5.1 characters per word, about 66 megabytes(more like 80MB with metadata). I believe it's limited to English, though. Assuming around 5000 written languages, perhaps 323(391) GB is a good upper limit figure. This is using zero compression. Compression would be interesting on such a unique dataset. There's also a matter of delimiters, and potentially the character sets used(metadata).

  14. #14
    I don't do your homework! opus's Avatar
    Join Date
    Jun 2000
    Location
    Good Old Europe
    Posts
    3,863

    Re: Mass Storage Question

    Quote Originally Posted by FireXtol View Post
    Compression would be interesting on such a unique dataset.
    Since we talk about unique datasets, i don't think that compression will be that "interesting"
    You're welcome to rate this post!
    If your problem is solved, please use the Mark thread as resolved button


    Wait, I'm too old to hurry!

  15. #15
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Mass Storage Question

    I suppose if words were randomly unique, compression would be trivial. Language presents lot of redundancy, and with limited character sets there's two good indicators compression can be substantial. I'd imagine a 90% compression ratio would be possible; shrinking the upper limit to under 40GB.

  16. #16
    Fanatic Member
    Join Date
    Jun 2008
    Posts
    1,023

    Re: Mass Storage Question

    you should think about how many languages are... think about it, the English dictionary contains a bit over 700.000 words. so all languages together, would be millions of words.

  17. #17
    Fanatic Member FireXtol's Avatar
    Join Date
    Apr 2010
    Posts
    874

    Re: Mass Storage Question

    Quote Originally Posted by Justa Lol View Post
    you should think about how many languages are... think about it, the English dictionary contains a bit over 700.000 words. so all languages together, would be millions of words.
    Hmmm. Point taken. Using Google's 13 and some odd million unigrams, times 5000 written languages is about 68 billion words. I'm not sure compression could reduce the number of bytes to lower than the word count. That'd be really impressive! Perhaps 70 GB is a more reasonable upper limit given these assumptions/figures.

  18. #18
    Banned
    Join Date
    Mar 2009
    Posts
    764

    Re: Mass Storage Question

    average : 70,000 words per language 80 languages currently, language updates each 300 years
    or so. text file of a dictionary is about 3Mbyte, humans existed 200000 years.
    but how many communities and type of humans on average in the past? do a range.
    do niandratals count ?

    i guess a Gbyte is more than enougth

  19. #19
    coder. Lord Orwell's Avatar
    Join Date
    Feb 2001
    Location
    Elberfeld, IN
    Posts
    7,621

    Re: Mass Storage Question

    Quote Originally Posted by moti barski View Post
    average : 70,000 words per language 80 languages currently, language updates each 300 years
    or so. text file of a dictionary is about 3Mbyte, humans existed 200000 years.
    but how many communities and type of humans on average in the past? do a range.
    do niandratals count ?

    i guess a Gbyte is more than enougth
    i would say they don't since they didn't have written language. Just pictograms.
    My light show youtube page (it's made the news) www.youtube.com/@artnet2twinkly
    Contact me on the socials www.facebook.com/lordorwell

  20. #20
    Fanatic Member
    Join Date
    Jun 2008
    Posts
    1,023

    Re: Mass Storage Question

    i think there are more then 80 languages? i speak 8 languages my self, and being able to speak 10% of the languages currently seems a bit much

    there are 192 or 196 (193 or 197 Faroe Island is a country, not a part of denmark, only a member of the danish kingdom) countries in the world, depending on how you define country. and i bet over half of those have their own language.

  21. #21

    Thread Starter
    PowerPoster Code Doc's Avatar
    Join Date
    Mar 2007
    Location
    Omaha, Nebraska
    Posts
    2,354

    Re: Mass Storage Question

    Quote Originally Posted by FireXtol View Post
    So just words? Not phrases, sentences, paragraphs.... Only unique words regardless of written language?

    According to Google, they have a nice collection of over 13 million unigrams(words). Assuming 5.1 characters per word, about 66 megabytes(more like 80MB with metadata). I believe it's limited to English, though. Assuming around 5000 written languages, perhaps 323(391) GB is a good upper limit figure. This is using zero compression. Compression would be interesting on such a unique dataset. There's also a matter of delimiters, and potentially the character sets used(metadata).
    I tend to agree. You could likely store all unique words that have ever been written in all of human history with half a terabyte. Further advances in compression could shrink that somewhat, but I am not sure there is anymore payout to that. Mass storage expansion and communication speeds have trumped that development, the same way that the Internet has all but crushed the compact disk and the floppy disk.
    Doctor Ed

  22. #22
    coder. Lord Orwell's Avatar
    Join Date
    Feb 2001
    Location
    Elberfeld, IN
    Posts
    7,621

    Re: Mass Storage Question

    Quote Originally Posted by Code Doc View Post
    I tend to agree. You could likely store all unique words that have ever been written in all of human history with half a terabyte. Further advances in compression could shrink that somewhat, but I am not sure there is anymore payout to that. Mass storage expansion and communication speeds have trumped that development, the same way that the Internet has all but crushed the compact disk and the floppy disk.
    compression on storage mediums seems to be passe'. However transmitted data will probably receive compression for years to come.
    My light show youtube page (it's made the news) www.youtube.com/@artnet2twinkly
    Contact me on the socials www.facebook.com/lordorwell

  23. #23
    Fanatic Member kregg's Avatar
    Join Date
    Feb 2006
    Location
    UK
    Posts
    524

    Re: Mass Storage Question

    Quote Originally Posted by BillGeek View Post
    Who'll do the re-typing?
    The monkeys of course! Bonus points if they churn out a Shakespeare play.

  24. #24
    Fanatic Member
    Join Date
    Jun 2008
    Posts
    1,023

    Re: Mass Storage Question

    Quote Originally Posted by kregg View Post
    The monkeys of course! Bonus points if they churn out a Shakespeare play.
    the monkeys are currently busy, they're working for youtube now.

  25. #25

  26. #26
    Fanatic Member
    Join Date
    Jun 2008
    Posts
    1,023

    Re: Mass Storage Question

    Quote Originally Posted by NickThissen View Post
    That explains a LOT of the videos on there
    i suppose so, but i meant the "a team of highly trained monkeys have been dispatched to deal with the situation" error message xD

  27. #27

    Re: Mass Storage Question

    Quote Originally Posted by Justa Lol View Post
    i suppose so, but i meant the "a team of highly trained monkeys have been dispatched to deal with the situation" error message xD
    I laugh so hard when I see that error message

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width