-
Jan 4th, 2011, 07:42 PM
#1
Thread Starter
PowerPoster
Mass Storage Question
Here's an interesting one--how many bytes of storage are required on a hard drive to store every single word ever written by humans in any language that humans have ever used for communication?
Can you provide a reasonanly close estimate? Please advise.
-
Jan 4th, 2011, 07:58 PM
#2
Re: Mass Storage Question
Just one byte, a really really big bite!
VB/Office Guru™ (AKA: Gangsta Yoda™ ®)
I dont answer coding questions via PM. Please post a thread in the appropriate forum.
Microsoft MVP 2006-2011
Office Development FAQ (C#, VB.NET, VB 6, VBA)
Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
If a post has helped you then Please Rate it!
• Reps & Rating Posts • VS.NET on Vista • Multiple .NET Framework Versions • Office Primary Interop Assemblies • VB/Office Guru™ Word SpellChecker™.NET • VB/Office Guru™ Word SpellChecker™ VB6 • VB.NET Attributes Ex. • Outlook Global Address List • API Viewer utility • .NET API Viewer Utility •
System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6
-
Jan 4th, 2011, 08:02 PM
#3
Thread Starter
PowerPoster
Re: Mass Storage Question
-
Jan 4th, 2011, 09:15 PM
#4
Re: Mass Storage Question
Originally Posted by Code Doc
Here's an interesting one--how many bytes of storage are required on a hard drive to store every single word ever written by humans in any language that humans have ever used for communication?
Can you provide a reasonanly close estimate? Please advise.
You aren't providing enough information. Are you including merely published works, or are you also including all my email, my grocery list in my pocket, etc?
plus, there's the fact there are over 4000 current spoken languages but not all of them even have written words, and some share. Mandarin and cantonese for example use the same written text. Do we include dialects here? you get my drift. Some estimating must be done. I seem to rmember someone once computing that an encyclopedia would fit on a floppy disk, so i am pretty sure a terabyte would be plenty big.
Last edited by Lord Orwell; Jan 4th, 2011 at 09:20 PM.
-
Jan 4th, 2011, 10:23 PM
#5
Re: Mass Storage Question
Originally Posted by Code Doc
Can you provide a reasonanly close estimate?
No..
-
Jan 5th, 2011, 12:33 AM
#6
Re: Mass Storage Question
Do we count only published (printed) material or any write/scribble in general? Do we count copies (in case of printed material do we count one book just once or once for every copy printed)?
-
Jan 5th, 2011, 01:35 AM
#7
Fanatic Member
Re: Mass Storage Question
you're gonna need a hell load of space and time to fill it on a hard drive.
-
Jan 5th, 2011, 04:08 AM
#8
Hyperactive Member
Re: Mass Storage Question
-
Jan 5th, 2011, 08:46 PM
#9
Thread Starter
PowerPoster
Re: Mass Storage Question
OK, I'll try to define the problem more accurately. Let's assume:
(1) All different words written by all humans in any language. Matched words are to be ignored. Common slang words are acceptable.
(2) Words composed of multiple words do not count. These must be separate words and not joined by hyphens or combined by inept text messagers or those text messagers trying to show off.
(3) Trivial concocted abbreviations and combinations are not to be included, such as URDum, PITA, and TIA.
Regardless of these restrictions, I have been told that the answer is many Petabytes, even with compression. I am having a hard time believing that.
Now what would it take to store all words?
-
Jan 5th, 2011, 08:58 PM
#10
Re: Mass Storage Question
What about when new words are created or eveolve during your writting process to the hard drive? Will yoiu have an ever continuing process processing new words? This will be a never ending task so we can not tell you how much space
VB/Office Guru™ (AKA: Gangsta Yoda™ ®)
I dont answer coding questions via PM. Please post a thread in the appropriate forum.
Microsoft MVP 2006-2011
Office Development FAQ (C#, VB.NET, VB 6, VBA)
Senior Jedi Software Engineer MCP (VB 6 & .NET), BSEE, CET
If a post has helped you then Please Rate it!
• Reps & Rating Posts • VS.NET on Vista • Multiple .NET Framework Versions • Office Primary Interop Assemblies • VB/Office Guru™ Word SpellChecker™.NET • VB/Office Guru™ Word SpellChecker™ VB6 • VB.NET Attributes Ex. • Outlook Global Address List • API Viewer utility • .NET API Viewer Utility •
System: Intel i7 6850K, Geforce GTX1060, Samsung M.2 1 TB & SATA 500 GB, 32 GBs DDR4 3300 Quad Channel RAM, 2 Viewsonic 24" LCDs, Windows 10, Office 2016, VS 2019, VB6 SP6
-
Jan 5th, 2011, 11:35 PM
#11
Re: Mass Storage Question
Originally Posted by Code Doc
(2) Words composed of multiple words do not count. These must be separate words and not joined by hyphens or combined by inept text messagers or those text messagers trying to show off.
Most languages contain many words formed from other words, or words with prefixes and suffixes.
What about languages which don't have words?
-
Jan 6th, 2011, 01:50 AM
#12
Re: Mass Storage Question
Originally Posted by penagate
What about languages which don't have words?
Which remiinds me, my wife had this strange look the other day, afterwards she blamed for not ..... Sounds like a language without words
You're welcome to rate this post!
If your problem is solved, please use the Mark thread as resolved button
Wait, I'm too old to hurry!
-
Jan 6th, 2011, 09:16 AM
#13
Re: Mass Storage Question
So just words? Not phrases, sentences, paragraphs.... Only unique words regardless of written language?
According to Google, they have a nice collection of over 13 million unigrams(words). Assuming 5.1 characters per word, about 66 megabytes(more like 80MB with metadata). I believe it's limited to English, though. Assuming around 5000 written languages, perhaps 323(391) GB is a good upper limit figure. This is using zero compression. Compression would be interesting on such a unique dataset. There's also a matter of delimiters, and potentially the character sets used(metadata).
Software I use and highly recommend: Opera, Miranda IM, Peerblock, Winamp, Unlocker Assistant, JoyToKey, Virtual CloneDrive, Secunia PSI, ExplorerXP, GOM Player, Real Alternative, Quicktime Alternative,Sumatra PDF, and non-freeware: Photoshop and VB6().
My codebank: AllRGB, Rounded Rectangle(math), Binary Server, Buddy Paint, LoadPictureGDI+, System GUID/Volume Serial, HexToAsc, List all processes and their paths, quasiString matching
Strings(search, extraction, retrieval etc): Retrieve BBCode Link from HTML, RemoveBetween ()'s, strFindBetween(str1,str2), Insert text in HTML, HTML - GetSpanByID
-
Jan 6th, 2011, 09:29 AM
#14
Re: Mass Storage Question
Originally Posted by FireXtol
Compression would be interesting on such a unique dataset.
Since we talk about unique datasets, i don't think that compression will be that "interesting"
You're welcome to rate this post!
If your problem is solved, please use the Mark thread as resolved button
Wait, I'm too old to hurry!
-
Jan 6th, 2011, 11:51 AM
#15
Re: Mass Storage Question
I suppose if words were randomly unique, compression would be trivial. Language presents lot of redundancy, and with limited character sets there's two good indicators compression can be substantial. I'd imagine a 90% compression ratio would be possible; shrinking the upper limit to under 40GB.
Software I use and highly recommend: Opera, Miranda IM, Peerblock, Winamp, Unlocker Assistant, JoyToKey, Virtual CloneDrive, Secunia PSI, ExplorerXP, GOM Player, Real Alternative, Quicktime Alternative,Sumatra PDF, and non-freeware: Photoshop and VB6().
My codebank: AllRGB, Rounded Rectangle(math), Binary Server, Buddy Paint, LoadPictureGDI+, System GUID/Volume Serial, HexToAsc, List all processes and their paths, quasiString matching
Strings(search, extraction, retrieval etc): Retrieve BBCode Link from HTML, RemoveBetween ()'s, strFindBetween(str1,str2), Insert text in HTML, HTML - GetSpanByID
-
Jan 6th, 2011, 12:35 PM
#16
Fanatic Member
Re: Mass Storage Question
you should think about how many languages are... think about it, the English dictionary contains a bit over 700.000 words. so all languages together, would be millions of words.
-
Jan 6th, 2011, 01:21 PM
#17
Re: Mass Storage Question
Originally Posted by Justa Lol
you should think about how many languages are... think about it, the English dictionary contains a bit over 700.000 words. so all languages together, would be millions of words.
Hmmm. Point taken. Using Google's 13 and some odd million unigrams, times 5000 written languages is about 68 billion words. I'm not sure compression could reduce the number of bytes to lower than the word count. That'd be really impressive! Perhaps 70 GB is a more reasonable upper limit given these assumptions/figures.
Software I use and highly recommend: Opera, Miranda IM, Peerblock, Winamp, Unlocker Assistant, JoyToKey, Virtual CloneDrive, Secunia PSI, ExplorerXP, GOM Player, Real Alternative, Quicktime Alternative,Sumatra PDF, and non-freeware: Photoshop and VB6().
My codebank: AllRGB, Rounded Rectangle(math), Binary Server, Buddy Paint, LoadPictureGDI+, System GUID/Volume Serial, HexToAsc, List all processes and their paths, quasiString matching
Strings(search, extraction, retrieval etc): Retrieve BBCode Link from HTML, RemoveBetween ()'s, strFindBetween(str1,str2), Insert text in HTML, HTML - GetSpanByID
-
Jan 6th, 2011, 08:25 PM
#18
Banned
Re: Mass Storage Question
average : 70,000 words per language 80 languages currently, language updates each 300 years
or so. text file of a dictionary is about 3Mbyte, humans existed 200000 years.
but how many communities and type of humans on average in the past? do a range.
do niandratals count ?
i guess a Gbyte is more than enougth
-
Jan 6th, 2011, 08:31 PM
#19
Re: Mass Storage Question
Originally Posted by moti barski
average : 70,000 words per language 80 languages currently, language updates each 300 years
or so. text file of a dictionary is about 3Mbyte, humans existed 200000 years.
but how many communities and type of humans on average in the past? do a range.
do niandratals count ?
i guess a Gbyte is more than enougth
i would say they don't since they didn't have written language. Just pictograms.
-
Jan 6th, 2011, 08:59 PM
#20
Fanatic Member
Re: Mass Storage Question
i think there are more then 80 languages? i speak 8 languages my self, and being able to speak 10% of the languages currently seems a bit much
there are 192 or 196 (193 or 197 Faroe Island is a country, not a part of denmark, only a member of the danish kingdom) countries in the world, depending on how you define country. and i bet over half of those have their own language.
-
Jan 6th, 2011, 09:25 PM
#21
Thread Starter
PowerPoster
Re: Mass Storage Question
Originally Posted by FireXtol
So just words? Not phrases, sentences, paragraphs.... Only unique words regardless of written language?
According to Google, they have a nice collection of over 13 million unigrams(words). Assuming 5.1 characters per word, about 66 megabytes(more like 80MB with metadata). I believe it's limited to English, though. Assuming around 5000 written languages, perhaps 323(391) GB is a good upper limit figure. This is using zero compression. Compression would be interesting on such a unique dataset. There's also a matter of delimiters, and potentially the character sets used(metadata).
I tend to agree. You could likely store all unique words that have ever been written in all of human history with half a terabyte. Further advances in compression could shrink that somewhat, but I am not sure there is anymore payout to that. Mass storage expansion and communication speeds have trumped that development, the same way that the Internet has all but crushed the compact disk and the floppy disk.
-
Jan 6th, 2011, 10:45 PM
#22
Re: Mass Storage Question
Originally Posted by Code Doc
I tend to agree. You could likely store all unique words that have ever been written in all of human history with half a terabyte. Further advances in compression could shrink that somewhat, but I am not sure there is anymore payout to that. Mass storage expansion and communication speeds have trumped that development, the same way that the Internet has all but crushed the compact disk and the floppy disk.
compression on storage mediums seems to be passe'. However transmitted data will probably receive compression for years to come.
-
Jan 7th, 2011, 10:11 AM
#23
Fanatic Member
Re: Mass Storage Question
Originally Posted by BillGeek
Who'll do the re-typing?
The monkeys of course! Bonus points if they churn out a Shakespeare play.
-
Jan 7th, 2011, 01:41 PM
#24
Fanatic Member
Re: Mass Storage Question
Originally Posted by kregg
The monkeys of course! Bonus points if they churn out a Shakespeare play.
the monkeys are currently busy, they're working for youtube now.
-
Jan 8th, 2011, 04:12 AM
#25
Re: Mass Storage Question
Originally Posted by Justa Lol
the monkeys are currently busy, they're working for youtube now.
That explains a LOT of the videos on there
-
Jan 8th, 2011, 06:00 PM
#26
Fanatic Member
Re: Mass Storage Question
Originally Posted by NickThissen
That explains a LOT of the videos on there
i suppose so, but i meant the "a team of highly trained monkeys have been dispatched to deal with the situation" error message xD
-
Jan 15th, 2011, 09:18 PM
#27
Re: Mass Storage Question
Originally Posted by Justa Lol
i suppose so, but i meant the "a team of highly trained monkeys have been dispatched to deal with the situation" error message xD
I laugh so hard when I see that error message
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|