|
-
Jan 6th, 2011, 09:16 AM
#16
Re: Mass Storage Question
So just words? Not phrases, sentences, paragraphs.... Only unique words regardless of written language?
According to Google, they have a nice collection of over 13 million unigrams(words). Assuming 5.1 characters per word, about 66 megabytes(more like 80MB with metadata). I believe it's limited to English, though. Assuming around 5000 written languages, perhaps 323(391) GB is a good upper limit figure. This is using zero compression. Compression would be interesting on such a unique dataset. There's also a matter of delimiters, and potentially the character sets used(metadata).
Software I use and highly recommend: Opera, Miranda IM, Peerblock, Winamp, Unlocker Assistant, JoyToKey, Virtual CloneDrive, Secunia PSI, ExplorerXP, GOM Player, Real Alternative, Quicktime Alternative,Sumatra PDF, and non-freeware: Photoshop and VB6( ).
My codebank: AllRGB, Rounded Rectangle(math), Binary Server, Buddy Paint, LoadPictureGDI+, System GUID/Volume Serial, HexToAsc, List all processes and their paths, quasiString matching
Strings(search, extraction, retrieval etc): Retrieve BBCode Link from HTML, RemoveBetween ()'s, strFindBetween(str1,str2), Insert text in HTML, HTML - GetSpanByID
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|