It seems some East Coast researchers are pushing the envelope in storing information in DNA. They encoded a book, roughly 5 megabits of data, into oligonucleotides, which were "synthesized by ink-jet printed, high-fidelity DNA microchips" which I don't fully understand but I presume means "made into a glob of DNA". The authors then sequenced the DNA and recovered all the data, with 10 incorrect bits.
The innate four bases (A,G,C,T) of DNA seem to lend themselves to some interesting storage techniques. The authors used simple redundancy for their storage - A and C both represented 0, G and T were 1, which was apparently a departure from earlier attempts which encoded each pair of bits into a single base. This made it easier to construct more robust sequences. I wonder if additional error-handling could have been done by placing checksum bases at intervals along the strand? Two bases would provide a range of 16 possible checksum values which seems it would handle a nice string of bits.
The book that was encoded had 50,000 words and eleven pictures. With an average code space of 40 bits per word, the text should have taken a tiny fraction of the total space, with the images providing the majority. Suppose that all ten bit errors were in one picture? It would be interesting to know how tightly compressed the images were. With a high compression factor, some of the bit errors might be substantial, but small changes to the compression might make a large difference in the visibility of any bit errors.
The authors say that DNA storage is dense, stable, and energy-efficient, but prohibitively expensive and slow to read and write compared to more standard storage. It will be fun to see how this technology evolves!
The innate four bases (A,G,C,T) of DNA seem to lend themselves to some interesting storage techniques. The authors used simple redundancy for their storage - A and C both represented 0, G and T were 1, which was apparently a departure from earlier attempts which encoded each pair of bits into a single base. This made it easier to construct more robust sequences. I wonder if additional error-handling could have been done by placing checksum bases at intervals along the strand? Two bases would provide a range of 16 possible checksum values which seems it would handle a nice string of bits.
The book that was encoded had 50,000 words and eleven pictures. With an average code space of 40 bits per word, the text should have taken a tiny fraction of the total space, with the images providing the majority. Suppose that all ten bit errors were in one picture? It would be interesting to know how tightly compressed the images were. With a high compression factor, some of the bit errors might be substantial, but small changes to the compression might make a large difference in the visibility of any bit errors.
The authors say that DNA storage is dense, stable, and energy-efficient, but prohibitively expensive and slow to read and write compared to more standard storage. It will be fun to see how this technology evolves!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.