Essay On Data Duplication

1259 Words
Abstract— Data De Duplication is one of the hottest topics in the backup storage systems and Data De-duplication is a way of detecting and eliminating duplicate data and also optimizes network bandwidth. Data De-duplication technique is applied mainly in the permanent storage devices such as backup recovery and data centers. There are many areas from startups to large enterprises that maintain data centers such as financial, Educational, pharmaceutical, Information Technology. Data De-duplication provides lot of benefits to these companies who maintain Data centers in the form of cost. There are many algorithms used to remove the duplicated data and each fails in something or the other. Here we have used Byte-Index algorithm and KMP Search algorithm to eliminate data in primary storage.

Index Terms—Data, Duplication. DeDuplication,
Procedure involved in Byte Indexing
The objective is to transmit only those which are non-overlapping files. The proposed system server has to perform a few processes in idle mode that will already have done before the arrival of the file DeDuplication request from client. In other words, the processes of indexing the server file, calculates hashes and move them to the server database are have to be done in the server at first in the system idle mode.

when the server receives a deduplication request from the client, the server examines every chunk index of through the requesting file, take out the chunk number (chunk index) of them and put them into “Index-table”. Particularly, this means distribute the all the chunk numbers of requested file from database to the 256*256 table (Index-table) mentioned in the previous sections. After that, the server sends “Index-table” to the client. The client receives “Index-table” and examines the client file (modified file) bytes to look for a high probable of duplicate chunks using received “Index-table”. It estimates hash values in the lookup results of duplicated probability
