Tuesday 3 April 2012

Binary content


Usenet was originally created to distribute text content encoded in the 7-bit ASCII character set. With the help of programs that encode 8-bit values into ASCII, it became practical to distribute binary files as content. Binary posts, due to their size and often-dubious copyright status, were in time restricted to specific newsgroups, making it easier for administrators to allow or disallow the traffic.
The oldest widely used encoding method for binary content is uuencode, from the Unix UUCP package. In the late 1980s, Usenet articles were often limited to 60,000 characters, and larger hard limits exist today. Files are therefore commonly split into sections that require reassembly by the reader.
With the header extensions and the Base64 and Quoted-Printable MIME encodings, there was a new generation of binary transport. In practice, MIME has seen increased adoption in text messages, but it is avoided for most binary attachments. Some operating systems with metadata attached to files use specialized encoding formats. For Mac OS, both Binhex and special MIME types are used.
Other lesser known encoding systems that may have been used at one time were BTOA, XX encoding, BOO, and USR encoding.
In an attempt to reduce file transfer times, an informal file encoding known as yEnc was introduced in 2001. It achieves about a 30% reduction in data transferred by assuming that most 8-bit characters can safely be transferred across the network without first encoding into the 7-bit ASCII space.
The standard method of uploading binary content to Usenet is to first archive the files into RAR archives (for large files usually in 15 MB, 50 MB or 100 MB parts) then create Parchive files. Parity files are used to recreate missing data. This is needed often, as not every part of the files reaches a server. These are all then encoded into yEnc and uploaded to the selected binary groups.

No comments: