RFC 468 (rfc468) - Page 2 of 7
FTP data compression
Alternative Format: Original Text Document
RFC 468 FTP Data Compression March 1973 The two main arguments for data compression are economics and convenience (usability). Consider first economics, which is essentially a trade-off between CPU time and transmission costs. Of course, as long as Network use is a free commodity, the economics of data compression are all bad. That happy state won't last forever. What does data compression cost? Let us consider only simple linear compression schemes, such as the one proposed here. By linear, I mean that the CPU time to examine a source record is proportional to number of bytes in the record. A simple linear scheme could detect repeated single characters, for example. One could imagine quadratic schemes, which detected repeated substrings; but except for possible special circumstance where the source stings have some structure known to the compression algorithm, the CPU economics don't favor quadratic compression. Assuming a reasonable figure for large-scale CPU costs in the generation of CCN's 360/91, we concluded that an upper bound on CPU costs for total compression and decompression would be 5 cents per megabit; this is based on very loose coding of a simple linear algorithm. This may be compared with the projected Network transmission costs of over 30 cents per megabit (possibly a lot over). Thus, the CPU time to conserve bandwidth costs significantly less than the bandwidth saved. Both CPU costs and bandwidth costs are trending downward, but it seems exceedingly unlikely that the ratio of CPU cost to bandwidth cost for linear compression will reverse in the next few years. On the other hand, this calculation clearly discourages one from using quadratic compression. WHY HASP CCN's batch remote job entry protocol NETRJS (see RFC #189, July 15, 1971) was designed to include two data transfer modes, truncated and compressed. The NETRJS truncated mode is essentially identical to current FTP block mode record structure (except for minor bit format differences). The compressed mode of NETRJS uses an adaptation of the particular compression scheme which is incorporated in the "Multileaving protocol" of the binary synchronous rje support in IBM's HASP system. Although it isn't really necessary for the purpose of defining a compression scheme in FTP, I have included an appendix summarizing very briefly the nature of HASP and its rje package. That appendix may be considered cultural enrichment for those in the Network Community who have been denied the privilege of being an IBM customer. After all, I know a lot of HASP experts who never heard of Braden



