^{1}

^{*}

^{2}

^{3}

^{1}

In this paper, the sticker based DNA computing was used for solving the independent set problem. At first, solution space was constructed by using appropriate DNA memory complexes. We defined a new operation called “divide” and applied it in construction of solution space. Then, by application of a sticker based parallel algorithm using biological operations, independent set problem was resolved in polynomial time.

DNA encodes the genetic information of cellular organisms. The unique and specific structure of DNA makes it one of the favorite candidates for computing purposes. In comparison with conventional silicon-based computers, DNA computers have massive degrees of miniaturization and parallelism. By recent technology, about 10^{18 }DNA molecules can be produced and placed in a test tube. Each of these DNA molecules could act as a small processor. Biological operations such as hybridization, separation, setting and clearing can be performed simultaneously on all of these DNA strands. Thus, in an in vitro assay we could handle about 10^{18 }DNA molecules or, we can say that 10^{18 }data processors can be executed in parallel.

DNA computing was initially developed by Leonard Adleman in 1994 [

In 1996, a new model of DNA computing (sticker model) was introduced by Roweis et al. [

In this paper, we applied sticker model for solving the independent set problem which is one of the NP-complete problems.

The paper is organized as follows. Section 2 introduces the DNA structure and various DNA computing models and discuss about Sticker based DNA computing and biological operations which are used in sticker model. Section 3 introduces a DNA based algorithm for solving the independent set problem in sticker model.

DNA is a polymeric molecule which is composed of monomers called Deoxyriboneucleotides. Deoxyriboneucleotides are building blocks of DNA and each of them contains three components: sugar, phosphate group and nitrogenous base. The sugar (deoxyribose) has five carbon atoms (1’ - 5’). The phosphate group is attached to the 5’ carbon of sugar and nitrogenous base is attached to the 1’^{ }carbon. There are four different nitrogenous bases which contribute in DNA structure: Thymine (T) and Cytosine (C) which are called pyrimidines; Adenine (A) and Guanine (G) which are called purines. Because nitrogenous bases are variable components of neucleotides, different neucleotides are distinguished by nitrogenous bases which contribute in their structure. For this reason the name of the bases are used to refer to the neucleotides, and the neucleotides are simply represented as A, G, C, and T. The neucleotides are link together by phosphodiester bonds and form single stranded DNA (ssDNA). A ssDNA molecule can be likened to a string consisting of a combination of four different symbols, A, G, C, T. Mathematically, this means we have a four letter alphabet ∑ = {A, G, C, T} to encode information. Two ssDNA molecules join together to form double stranded DNA (DsDNA) based on complementary rule: “A” from one strand bond to “T” from opposite strand, and “C” bond to “G”. In

DNA computing was initially developed by Leonard Adleman in 1994 [

In 1996, Roweis et al. [

2^{56} keys, which, at a rate of 100,000 operations per second, would take 10,000 years. In contrast, it was estimated that DES could be broken by using sticker based DNA computing in about 4 months.

Other than Adleman-Lipton and Sticker based models, other various models are also proposed in DNA computing by researchers. Quyang et al. [

The surface-based model was introduced by Liu et al. [

The sticker model was introduced by S. Roweis et al. [

Another conception in sticker model is (K, L) library. Each (K, L) library contains memory complexes with K bit regions, the first L bit regions are either on or off, in all possible ways, whereas the remaining K-L bit regions are off. The last K-L bit regions can be used for intermediate data storage. In every (K, L) library there are at least 2^{L} memory complexes. In

There are four principal operations in sticker model: combination, separation, setting and clearing [

1) Combine (T_{0}, T_{1}, T_{2}): the memory complexes from the tubes T_{1} and T_{2 }are combined to form a new tube T_{0}, simply the contents of T_{1} and T_{2} are poured into tube T_{0. }(T_{0} = T_{1} È T_{2}).

2) Separate (T_{0}, i) ® (T_{1}, T_{2}), this operation creates two new tubes T_{1} and T_{2}, T_{1 }contains the memory complexes having the i^{th} bit on (T_{1 }= +(T_{0}, i)) and T_{2} contains the memory complexes having the i^{th} bit off (T_{2 }= –(T_{0}, i)).

3) Set (T_{0}, i): the i^{th }bit region on every memory complex in tube T_{0 }set to 1 or turned on, or we can say the sticker for that bit is annealed to corresponded bit region on every memory complex.

4) Clear (T_{0}, i): the i^{th }bit region on every memory complex in tube T_{0} set to 0 or turned off, or we can say the stickers of that bit region must be removed (if present) from every memory complex in tube T_{0}.

5) Divide (T_{0}, T_{1}, T_{2}): by this operation, the contents of tube T_{0} is divided into two equal portions and poured into the tubes T_{1} and T_{2}.

In graph theory, an independent set of a graph G = (V, E),

where V is the set of the vertices and E is the set of the edges, is a subset V^{1} Í V such that no two vertices in V^{1} are joined by an edge in E. On the other hand, the independent set is a set of vertices such that for every two vertices, there is no edge connecting the two. The size of an independent set is the number of vertices it contains. A maximum independent set is a largest independent set for a given graph G and the problem of finding such a set is called the maximum independent set problem. The maximum independent set problem has been proved to be a NP-complete problem [

All of the independent sets for our graph are as follows: {V_{1}}, {V_{2}}, {V_{3}}, {V_{4}}, {V_{5}}, {V_{1}, V_{2}}, {V_{1}, V_{3}}, {V_{1}, V_{4}}, {V_{2}, V_{3}}, {V_{3}, V_{5}}, {V_{4}, V_{5}}, {V_{1}, V_{2}, V_{3}}

It is clear that the independent set of the maximum size is {V_{1}, V_{2}, V_{3}}, furthermore, the size of the independent set problem in our graph is 3.

First of all, it is essential to generate the sticker based DNA solution space of our problem. Then, basic biological operations are used to select legal strands and remove illegal strands from the solution space. It is obvious that a graph with N vertices has 2^{N} subset of vertices or 2^{N} possible independent sets. Furthermore, each possible independent set can be represented by an Ndigit binary number. If the i^{th} bit in an N-digit binary number is set to 1, it represents that the i^{th} vertex is in the independent set. If the i^{th} bit in an N-digit binary number is set to 0, it represents that the i^{th} vertex is not in the independent set.

Our graph has 5 vertices and 32 possible independent sets. All of these possible independent sets can be represented by a 5-digit binary number. For example the binary number 11100 represent the {V_{1}, V_{2}, V_{3}} or the binary number 11111 represent the {V_{1}, V_{2}, V_{3,} V_{4}, V_{5}}.

To represent all possible independent sets by appropriate DNA memory complexes, we introduce two methods for construction of solution space.

Method One:

In this method which is proposed by Roweis et al., [

first we provide enough amounts of memory strands with at least 5 bit regions. Then, the strands are split into two tubes A and B. To one tube (tube A) is added an excess amounts of all stickers and set all bits on. The unused stickers are then removed. The contents of tubes A and B are then mixed together and by rising temperature all annealed stickers dissociate from memory strands. Finally the mixture is cooled again, causing the stickers to randomly anneal to the memory strands. By this method, it is expected that approximately 63% of memory complexes will be produced. Of course, this percentage can obviously be increased by starting with more memory strands. The advantage of this method is that the solution space is produced at single step, but its main disadvantage is that some memory complexes may not be produced by above method.

Method Two:

1) Input (T_{0}), where T_{0} contains large amounts of memory strands with at least N (number of vertices in graph) bit regions.

2) For i = 1 to N, where N is number of vertices in graph a) Divide (T_{0}, T_{1}, T_{2})

b) Set (T_{1}, i)

c) Combine (T_{0}, T_{1}, T_{2})

End for Note: Method two has N divide, N set and N combine operations. At the end of procedure, tube T_{0} contains all of the memory complexes which each of them represent one of the possible independent sets. The main advantage of this new method is that the memory complexes representing all possible independent sets will be produced definitely. In our example, tube T_{0} contains 32 different memory complexes. The 32 memory complexes which are produced during construction of solution space are shown in

The following algorithm is proposed for solving the independent set problem:

1) For k = 1 to m, where m is the number of edges in the graph G a) Let e_{k} = (v_{i}, v_{j}), where e_{k} is one edge and v_{i}, v_{j} are vertices which are connected to each other by e_{k}

b) Bit regions i and j represents v_{i} and v_{j}, respectively.

c) Separate (T_{0}, i) → (T_{1}, T_{2})

d) Separate (T_{1}, j) → (T_{3}, T_{4})

e) Discard T_{3}

f) Combine (T_{0}, T_{2}, T_{4})

2) For i = 0 to n – 1 For j = i down to 0 Separate (T_{j}, I + 1) → (T_{(j+1)}_{’}, T_{j})

Combine (T_{j+1}, T_{j+1}, T_{(j+1)}_{’})

3) Read T_{n}; else if it was empty then:

Read T_{n–1}; else if it was empty then:

Read T_{n–2}; else if it was empty then:

Read T_{2}; else if it was empty then:

Read T_{1};

According to the steps in the algorithm, the independent set problem can be resolved by sticker based DNA computation in polynomial time. Any two vertices connected in the graph G cannot be the members of the same independent set; hence their corresponding bit regions cannot be set to 1. For example, we have four edges in our graph: (v_{1}, v_{5}), (v_{2}, v_{5}), (v_{2}, v_{4}), (v_{3}, v_{4}), therefore “1xxx1”, “x1xx1”, “x1x1x” and “xx11x” (x can be either 1 or 0) cannot be independent set and memory complexes representing them should be removed from solution space.

Step 1 of the algorithm is executed m times (m is 4 in our graph) and at each round of execution the memory complexes representing the subsets “1xxx1”, “x1xx1”, “x1x1x” and “xx11x” are removed from tube T_{0}. At the end of step 1, illegal memory complexes are removed from solution space and tube T_{0} only contains the memory complexes which represent legal independent sets. ({V_{1}}, {V_{2}}, {V_{3}}, {V_{4}}, {V_{5}}, {V_{1}, V_{2}}, {V_{1}, V_{3}}, {V_{1}, V_{4}}, {V_{2}, V_{3}}, {V_{3}, V_{5}}, {V_{4}, V_{5}}, {V_{1}, V_{2}, V_{3}}).

By the execution of step 2, the memory complexes without any annealed stickers are placed in tube T_{0}, the memory complexes with only one annealed sticker are placed in tube T_{1}, the memory complexes with 2 annealed stickers are placed in tubes T_{2}, the memory complexes with 3 annealed stickers are placed in tube T_{3}, and so on. Note, that the numbers of annealed stickers in every memory complex represent the number of vertices in corresponding independent set. For example, in Graph of our problem, the legal independent sets with one vertex ({V_{1}}, {V_{2}}, {V_{3}}, {V_{4}}, {V_{5}}) are placed in tube T_{1 }, and those with 2 vertices ({V_{1},V_{2}}, {V_{1},V_{3}}, {V_{1},V_{4}}, {V_{2},V_{3}}, {V_{3},V_{5}}, {V_{4},V_{5}}) are placed in tube T_{2} and finally the independent set with 3 vertices ({V_{1},V_{2},V_{3}}) is placed in tube T_{3}.

The legal memory complexes which remain in tube T_{0 }at the end of step 1 of the algorithm and their final places at the end of step 2 are shown in

In step 3, all of the tubes (from T_{n} to T_{1}) are evaluated for presence of memory complexes, and the first tube which is not empty and contains memory complexes represent maximum independent set. In our example, tubes T_{5} and T_{4} are empty and devoid of any memory complexes. The first tube which contains DNA molecules is tube T_{3}. As mentioned before, the memory complexes in tube T_{3} have 3 annealed stickers; therefore, the maximum independent set in our graph is 3.

In this paper, the sticker based DNA computing was used for solving the independent set problem. This method could be used for solving other NP-complete problems. There are four principal operations in sticker model: Combination, Separation, Setting and Clearing. We also defined a new operation called “divide” and applied it in construction of solution space. This new method of construction of solution space has some advantages over other methods. The main advantage of this new method is that the memory complexes representing all possible independent sets will be produced definitely.

Among the four principal operations in sticker model, “Clearing” operation is the most problematic and there are some difficulties and limitations in perfect execution of that. For this reason, we have not used this operation in our algorithm.