# Open Problems

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The following page lists interesting topics and open problems contributed by workshop participants. Please continue to add to this list. You can indicate if you are interested in working on a particular topic by adding your name to the interested participants list. In the wiki spirit, please also correct any mistakes or add additional clarity when helpful.

Note: $\LaTeX$ notation should render as expected. (e.g., $A + B \overset{k_1}{\rightarrow} C$)

# Relating space-bounded, logically reversible Turing machines with CRNs

### Overview

The overarching problem here is to understand how reversible CRNs relate to other reversible computing models, particularly space bounded reversible Turing machines (TMs). If you want to develop a deeper understanding of reversible computational complexity classes and work on an interesting problem with a level of technical difficulty that is amenable to making good progress within a week, then this might be the problem for you.

### Context

There is a rich history of research on reversible space bounded Turing machines. Following his seminal results on time-bounded, logically reversible TMs, Bennett (1973) asked whether TM-SPACE$(s) =$ reversible-TM-SPACE$(s)$. Fifteen years later he made some progress towards this, showing that TM-SPACE$(s) \subseteq$ reversible-TM-SPACE$(s^2)$. Many others worked to improve Bennett's result; finally Lange, McKenzie and Tapp (2000) answered Bennett's 1973 question in the affirmative. More recently, Thachuk et al. described what it means for a DNA strand displacement system (DSD) to use space (or equivalently volume) efficiently. Roughly, just as memory-efficient TMs reuse memory, space-efficient DSDs "recycle" strands by running reactions in both the forward and reverse directions. We also introduced restricted CRN models that can be compiled into DSDs in a space-preserving way.

While our CRN models have plausible physical realizations as DSD's, it is not clear that logically reversible, space-bounded reversible TMs have such physical realizations that preserve space. Roughly, the problem is that such TMs may "run" all of their transitions in the "forward" direction (rather than in both the forward and backwards directions, as do our recycling CRNs and DSDs), and so when one tries a straightforward approach to "compile" these space-bounded TMs into DSDs, the space (or volume) blows up exponentially. Can this exponential space blow-up be avoided?

### Concrete problem

One concrete first step to tackling this would be to study Bennett's 1989 construction. The description is at a fairly high level; it has an elegant and simple recursive structure that should be easily amenable to analysis. Is there a lower-level description, or "implementation" at the TM transition level, in which it is clear that transitions run alternately in the forwards and backwards direction? Progress on this should provide a nice strengthening of traditional complexity-theoretic results on space-bounded logically reversible computation and link this traditional work with current work on CRNs.

References.

Bennett 1973. Logical Reversibility of Computation http://www.math.ucsd.edu/~sbuss/CourseWeb/Math268_2013W/Bennett_Reversibiity.pdf

Condon et al. 2012. Less haste, less waste: On recycling and its limits in strand displacement systems http://rsfs.royalsocietypublishing.org/content/early/2012/02/08/rsfs.2011.0106.short

Thachuk, C., & Condon, A. 2012. Space and energy efficient computation with DNA strand displacement systems. http://www.cs.ubc.ca/~condon/papers/dna-pspace.pdf

### Problem context

Biological organisms are exemplars of self-organisation. Mathematical approaches like reaction-diffusion systems attempt to capture geometrical self-organization --- how complex patterns can arise from uniform initial conditions. Analogously we ask how well-mixed chemical systems can transform uniform initial conditions to controlled amounts of desired species. Having certain species present in precise counts (eg a species $L$ with exactly 1 molecule) is a prerequisite for a number of sophisticated CRNs in the literature. For example, constructions for polynomial-time Turing machine simulation requires a "leader": a species with initial count 1, which is used to coordinate interaction among the other species in a controlled way [1].

If a CRN starts with uniform initial conditions where only "high-count" species are initially present (e.g., count $n$ of some species $X$ in volume $n$), then it is possible to "elect" a leader through two reactions:

$X \to L$

$L + L \to L$

However, this takes expected time $O(n)$, which is "slow" compared with other basic forms of CRN computation. For example, for example, the reaction $X \to Y + Y$ "quickly" computes multiplication by 2 in the sense that it takes expected time $\log(n)$ to convert $n$ copies of $X$ into $2n$ copies of $Y$. The open question is whether a leader can be elected quickly (say in polylogarithmic time, or even sublinear time).

Algorithms for leader election have been extensively studied in the distributed computing literature. The possibility of fast leader election is a long standing open problem in "population protocols," a model which is formally equivalent to a subclass of CRNs. Angluin et al ("Fast Computation by Population Protocols With a Leader") developed an algorithm which appears to elect a leader in sublinear time with high probability based on extensive numerical simulation. Although the complexity of the population protocol has so far hindered efforts to prove correctness, it informs our understanding of what may or may not be possible, and suggests new techniques for a positive result. Proving either a positive or negative result would be an important development outside of the CRN literature and would engage the larger CS theory community.

### Problem overview

Question: Is there a CRN with an initial state with count $n$ of a species $X$ in volume $n$ and count 0 of everything else, that in expected time $\log^k(n)$ for some constant $k$, reaches a state $\vec{c}$ with count 1 of $L$ (and arbitrary counts of everything else), such that every state reachable from $\vec{c}$ has count 1 of $L$?

### References

Dana Angluin, James Aspnes and David Eisenstat. Fast Computation by Population Protocols With a Leader. Distributed Computing. [1]

Dana Angluin, James Aspnes and David Eisenstat. A Simple Population Protocol for Fast Robust Approximate Majority. Distributed Computing. [2]

Etienne Perron, Dinkar Vasudevan, and Milan Vojonvic. Using Three States for Binary Consensus on Complete Graphs TR, September 2008. [3]

David Doty. Timing in chemical reaction networks, SODA 2014. [4]

# CRNs augmented with polymers

### Problem context

Chemical reaction networks are known to be Turing-universal with probability $1-\epsilon$ (for any $\epsilon>0$) [[1].] CRNs augmented with a constant number of stacks are Turing-universal [2]. This was demonstrated using a DNA strand displacement system construction that exploited the fact that DNA can easily form polymers. The stacks in that construction behave as one would expect: a molecule can be added to the top of the stack, can be removed from the top of the stack, and molecules not on the top are not accessible to other reactions. A stack of type $i$ initiates from a special subunit molecule called $\bot_i$. Simplifying details from [2], this leads to CRN reactions of the form:

\begin{align} [\bot_i \ldots X] + Y &\rightarrow [\bot_i \ldots XY]\\ [\bot_i \ldots XY] + POP_i &\rightarrow [\bot_i \ldots X] + Y \end{align}

#### Initial context assumption

As with many results that demonstrate CRNs capable of complex computation [1][3], the stack machine construction in [2] assumes that certain molecular species are present in exact initial quantities, in particular there is exactly one copy of each stack.

### Problem overview & questions

What is the power of CRNs augmented with stacks, without the initial context assumption? (i.e., the concentrations of stack subunit species, $\bot_i$, are given as part of the input.) The inspiration of this question can be found in the ability for DNA strand displacement systems to form polymers (complexes consisting of multiple DNA strands bound via hybridization of complementary domains). While polymers can simulate stacks, they are more general and open up new possibilities for programming CRNs.

Interesting thought experiments in this model may include:

• Beginning from a population of input $X$, can stacks be used to output $\log(|X|)$ copies of an output species $Y$?
• Could consensus of two input species $X$ and $Y$ be computed exactly, and efficiently, with the use of a constant number of stack types?
• What additional computational power, if any, arises when the model is broadened to included:
• double ended stacks / queues
• stacks/queues/polymers that can be efficiently concatenated with another, at their ends
• polymers that can be efficiently "merged" with another, at multiple locations along the polymer

### CRNs with complexation/decomplexation

In a 2010 paper "Turing universality of the Biochemical Ground Form" Cardelli and Zavattaro define an extension of (an equivalent language to) CRNs with operations of complexation and decomplexation, and show that the extended language (BGF) is Turing universal by emulating a register machine (where registers are stacks). But in BGF stacks are not built-in, and BGF can represent pretty much any complex dynamic polymer with a finite program.

Questions:

• Can this be useful as a formal language (analogously to CRN) to investigate expressibility questions involving polymerization?
• Is there a general DSD implementation of BGF (using DNA's ability to form complexes), the same way there is a DSD implementation of CRN?

# CRNs and quantum mechanics

### Context

For small numbers of molecules, time evolution for a CRN is described by the chemical master equation: this says how the probability of having any number of molecules of each given species changes with time. A very similar formalism appears in quantum mechanics, but with 'complex amplitudes' replacing nonnegative real probabilities. As a result, we can take techniques from quantum mechanics and use them to give new proofs of results in CRN theory:

Conversely, we can take ideas from CRN theory and see if they apply to quantum mechanics.

### Problem overview and questions

At this stage, the main challenge is to identity really fruitful directions for research. So, this may be a good thing to work on when people are in the mood for brainstorming rather than solving precisely posed problems.

Some sample questions include:

• Can we think about quantum computation with interacting particles as analogous to CRN computation? The big difference is that amplitudes are complex, giving rise to destructive interference, so if a reaction can occur in two ways, the amplitudes for these two ways can (perhaps partally) cancel out. This has no analogue with probabilities, and in existing work on quantum computation this interference phenomenon is very important: it allows a computer to 'try many options simultaneously' and then have the results cancel out except for the desired ones. Nonetheless, there may be an area of overlap between quantum computation and CRN computation.
• Can we push forward the existing lines of research on using quantum techniques to study CRNs? Examples include:
• studying how the chemical master equation approaches the rate equation in the large-number limit.
• studying stable equilibria for certain classes of CRNs: for example, those with deficiency zero (or low deficiency).

However, I bet we can come up with better questions!

# CRNs and circuit complexity

### Problem context

Consider CRNs in which we give input using the binary convention: to specify the $i$'th bit is 1, have count 1 of $X_i$ in the initial state; otherwise have count 0.

### Problem overview

High-level question: What is the relationship between resource bounds (e.g., number of species/reactions) for CRNs and resource bounds (e.g., number of gates, depth) for Boolean circuits?

Precise question (one example): Let $n \in \mathbb{N}$ and let $\phi:\{0,1\}^n \to \{0,1\}$ be a Boolean function with $n$ inputs. What is the relationship between the Boolean circuit size $\textrm{size}_{BC}(\phi)$ of $\phi$ (the minimum number of wires in a Boolean circuit with AND, OR, and NOT gates that computes $\phi$) and the "CRN size complexity" $\textrm{size}_{CRN}(\phi)$ of $\phi$? (the minimum number of species/reactions in any CRN that stably computes $\phi$)?

Here, "the CRN stably computes $\phi$" is formally defined in several papers, e.g., [4]. Briefly, it means that there are species $Y$ ("yes") and $N$ ("no"), and starting from the initial state $\vec{x}$, for any state $\vec{c}$ reachable from $\vec{x}$, there is a state $\vec{o}$ reachable from $\vec{c}$ with the following property:

• If $\phi(\vec{x}) = 1$ then $Y$ is present in $\vec{o}$ and $N$ is absent, and if $\phi(\vec{x}) = 0$ then $N$ is present in $\vec{o}$ and $Y$ is absent, and this remains true in any state reachable from $\vec{o}$.

It is clear that $\textrm{size}_{CRN}(\phi) \leq O(\textrm{size}_{BC}(\phi))$. The reverse direction is not clear.

Some preliminary work suggests perhaps $\textrm{size}_{CRN}(\phi)$ could be characterized in terms of the depth of Boolean circuits that compute $\phi$.

# Shy molecules

Some input molecules are shy, they hide in the corner until our CRN computation thinks its done. Then they show up and the CRN should realise it has the wrong answer! For example consider a CRN that is given as input count $n\in\mathbb{N}$ of some input species, and it (erroneously) "reads" only $n-1$ of these and then presents an answer; clearly this answer is premature and possibly incorrect. Another kind of input format where this issue arises is where we give the input to the CRN that computes on binary strings as a positive indicator species for each input symbol that is $1$, e.g. the input string $x = 01001010$ would be presented as count $\geq 1$ of each of the molecules $1_2, 1_5, 1_7$ (these input species are sometimes called indicator species; they indicate which bits are on).

Often, to get around this issue we (theoretically or in the lab) provide CRN inputs that contain both positive and negative indicator species (sometimes called dual rail inputs). For example, the input string $x = 01001010$ would be presented as count $\geq 1$ of each of the molecules $0_1, 1_2, 0_3, 0_4, 1_5, 0_6, 1_7, 0_8$.

Vague question: What does dual-rail input buy us? What does it cost us?

For example, suppose we want a strict notion of halting with an answer, which we here call announcement stable and define as: the CRN outputs by reaching a configuration $c$ with a single answer molecule (one copy $z$ of a molecule from the set $\{\text{yes}, \text{no}\}$) and every configuration reachable from $c$ gives the same answer (contains $z$ and contains zero copies of the other answer molecule). Note that this is stricter than output stable, defined elsewhere on this page. Dual rail inputs seem to more easily facilitate designing of CRNs that are announcement stable.

One can consider this question both in terms of families of CRNs (see elsewhere on this page) computing a function, or a single CRN, computing a function.

Slighly more precise question: Consider the class of uniform families of announcement stable, and uniform families of output stable CRNs. What do we pay, in terms of CRN complexity (size, time, etc.), for choosing one output convention over another for solving general classes of problems?

The following related (and very tentative!) question asks if we can have a CRN model where we are forced to deal with input counts, but where we can be clever and cheaply achieve announcement stability.

Question: Consider a family of CRNs, that compute functions on $n$-bit strings, but where the number of input species to the CRN is $< n$, say $O(\log\log n)$, or $O(\log \log \log\log\log\log n)$ (so we are forced to put "most" of the input into initial counts). Can we handle shy molecules here? More precisely, in this model, can we compute in an announcement stable way, without paying significant costs?

Another related question: By allowing a non-constant, but small ($o(\log log n)$), number of input species, can we compute faster than with a constant number? What is the trade-off between speed of computation and number of input species?

# Computing partial functions

Update: After typing all this out, I (Dave Doty) realized that because $\mathbb{N}^k$ is well-quasi-ordered, it is impossible to have an infinite subset of $\mathbb{N}^k$ with all elements being pairwise incomparable. I'm leaving the question in the hopes that some discussion can happen, but I don't know how to formalize a question about the complexity of partial functions with finite domains. (Maybe circuit complexity can help here?)

### Problem context

The standard notion of "stable computation" with CRNs[4] says that a CRN computes a function $f:\mathbb{N}^k \to \mathbb{N}$ (the concept also makes sense with computing predicates $\phi:\mathbb{N}^k \to \{0,1\}$) if, for every state $\vec{c}$ reachable from the initial state $\vec{x}$, there is a correct "output stable" state $\vec{o}$ reachable from $\vec{c}$. Here, to say that $\vec{o}$ is output stable means that the count of the output species $Y$ cannot change (no reaction producing or consuming $Y$ is possible in $\vec{o}$ or in any state reachable from $\vec{o}$). To say that it is correct means that the count of $Y$ is equal to the intended output $f(\vec{x})$.

One property to observe is that the definition does not capture any notion of the CRN "knowing" when it has computed the correct answer. In the case of functions, the count of $Y$ changes as reactions occur, and eventually it stops changing, but there is no signal that the CRN gives to indicate that the count of $Y$ has stabilized. In the case of predicates, some of the species "vote" for a YES or NO answer, and eventually all the voters are unanimous and correct, but the CRN does not signal that the voters have stabilized.

One could imagine making an additional requirement of the CRN: there is a special molecule $H$ (the "halt" molecule), so that the count of $H$ is initially 0, the CRN is guaranteed eventually to produce $H$, and it is guaranteed that $H$ will not be produced until the answer has stabilized to the correct value. This would enable CRNs to be more easily composed, for example, since a downstream CRN could have its computation triggered by the $H$ molecule of an upstream CRN, to guarantee that they do not execute simultaneously and interfere with each other.

Unfortunately, in the case of total functions, this is impossible for stable computation of nontrivial functions or predicates. In particular, if there are two inputs $\vec{x}_1 < \vec{x}_2$ such that $f(\vec{x}_1) \neq f(\vec{x}_2)$, then there is a sequence of reactions from $\vec{x}_1$ that leads to state $\vec{o}_1$ with $\vec{o}_1(H) > 0$, and since $\vec{x}_1 < \vec{x}_2$, that sequence of reactions is applicable to $\vec{x}_2$, but since $f(\vec{x}_1) \neq f(\vec{x}_2)$ this means it is possible to produce $H$ from $\vec{x}_2$ before the CRN has stabilized to the correct answer $f(\vec{x}_2)$.

However, if $f:\mathbb{N}^k \dashrightarrow \mathbb{N}$ is a partial function, then it is possible for $\mathrm{dom}\ f$ to have all its elements be pairwise incomparable: $(\forall \vec{x}_1, \vec{x}_2 \in \mathrm{dom}\ f)\ \vec{x}_1 \not < \vec{x}_2$. So-called "dual-rail" representations of inputs to Boolean predicates $\phi: \{0,1\}^k \to \{0,1\}$, in which the $i$'th bit is represented in the initial state $\vec{x}$ by two species $1_i$ and $0_i$: $\vec{x}(1_i) = 1$ and $\vec{x}(0_i) = 0$ to represent that the $i$'th bit is 1, and $\vec{x}(1_i) = 0$ and $\vec{x}(0_i) = 1$ to represent that the $i$'th bit is 0. Therefore the set of CRN input states (a vector in $\N^{2k}$ since there are two species to represent each of the $k$ input bits to $\phi$) is pairwise incomparable.

In particular, any Boolean predicate $\phi: \{0,1\}^k \to \{0,1\}$ on a fixed number of input can be stably computed by a CRN, producing molecule $H$ precisely when the output stabilizes. With a standard "indicator" representation (one species $1_i$ to represent the $i$'th input bit), this is impossible for some predicates due to the pairwise comparability of some input states.

### Problem statement

Question 1: What is the complexity of partial functions $f:\mathbb{N}^k \dashrightarrow \mathbb{N}$ and predicates $\phi:\mathbb{N}^k \dashrightarrow \{0,1\}$ that can be stably computed by a CRN?

Question 2: What is the complexity of partial functions $f:\mathbb{N}^k \dashrightarrow \mathbb{N}$ and predicates $\phi:\mathbb{N}^k \dashrightarrow \{0,1\}$ that can be stably computed by a CRN, with the "announcement of halting" convention described previously? (i.e., the CRN produces molecule $H$, but not until the output has stabilized to the correct answer)

In both cases, the CRN is allowed to have undefined behavior on any initial state $\vec{x}$ not in the domain of the function/predicate. This is similar to the computational complexity notion of a promise problem, in which the Turing machine is "promised" to be given only inputs in the domain of the function and is allowed to be undefined -- e.g., not halt -- on other inputs.

The answer to both questions may depend on the complexity of the domain $\mathrm{dom}\ f$. When the function is total, then the answers for functions and predicates are known to be semilinear functions and semilinear predicates, respectively. Obviously, one can define a partial predicate $\phi:\mathbb{N}^k \dashrightarrow \{0,1\}$ such that $\mathrm{dom}\ \phi$ is very complex but $\phi(\vec{x})=1$ for all $\vec{x} \in \mathrm{dom}\ \phi$. Then the support of $\phi$ (the set $\{\vec{x}|\phi(\vec{x})=1\}$) would also be complex, but intuitively we consider $\phi$ to be very simple since it is partitioning $\mathrm{dom}\ \phi$ in a trivial way.

Intuitively, the questions (in the case of predicates) are asking: "can $\phi$ partition the set $\mathrm{dom}\ \phi$ in a way 'more complex than semilinear' if it is allowed to have undefined behavior on inputs $\vec{x} \not \in \mathrm{dom}\ \phi$?"

# Families of CRNs

### Context

Models of computation are often presented either as (i) a single device that works for all inputs (e.g. a Turing machine, a Python program), or (ii) a family of devices where we have one device for each input length (e.g. a family of Boolean circuits with exactly one circuit $c_n$ for each input string length $n$). The relationship between standard models and CRNs where in both models we have (i) one device for all inputs is beginning to be well-understood (but still has interesting questions), a number of the questions above are best formalized in the (ii) family of devices setting. The purpose of this section is to describe how the latter setup might look.

Families of devices can be uniform or nonuniform: we say that a family of devices is uniform if there is an algorithm that describes the entire family. For example, we say that a language $L$ is accepted by a uniform Boolean circuit family $\mathcal{C}$ if there exists a computable function $f : \{1\}^* \rightarrow \mathcal{C}$ such that $$\mathcal{C} = \left\{ c_n \mid \forall n\in\mathbb{N}, f(1^n) = c_n \text{ is a circuit that accepts } \{0,1\}^n \cap L \right\}.$$

The distinction between single-machine versus families of devices has appeared in work on CRNs. For example, here are two example ways for a CRN to compute a function of its input.

• The input could be the initial count of some single species $X$. Then any natural number $n$ can be represented, or equivalently one could represent a binary string $x$ by using natural number $n$ as input, if $x$ is the $n$'th binary string in lexicographic order.
• The input could be a binary string $x$ of length $n$ represented by $n$ different input species $X_1,\ldots,X_n$. The presence or absence of each of these species is then able to represent any bit string of length $n$. Indeed, for each $n \in\mathbb{N}$ there is exactly one CRN, and we call the entire infinite set of CRNs a family.

### Problem overview

High-level question: What is the relationship between uniform families of CRNs and uniform families of Boolean circuits, asynchronous Boolean circuits, or other "standard" models?

Example concrete questions appear elsewhere on this page (one can ask about CRN size, time, restricted forms of input and output, and many other questions in this setting). In order not to trivialize the theory, when we have an infinite set of devices computing some function or deciding a language, it is important to set up clear input and output conventions, and to define how powerful the individual devices can be. We propose the following definition.

### Definition: Language acceptance by a uniform family of CRNs

Let $\mathcal{R}$ be a set, called a family, of CRNs. We say that a language $L$ is accepted by a uniform family of CRNs $\mathcal{R}$ if there exist two functions: (1) $f : \{1\}^* \rightarrow \mathcal{R}$ and (2) $e$ (called the input encoder) with domain $\{ 0,1 \}^*$ and range the set of all multisets over the (input) species of the CRNs, and $$\mathcal{R} = \{ r_n \mid \forall n\in\mathbb{N}, f(1^n) = r_n \text{ is a CRN that accepts the input } e(x) \text{ for all } x \in \{0,1\}^n \cap L \}.$$

Some comments: Usually, we require that $f$ and $e$ are very simple (e.g. logspace computable) in a complexity theoretic sense, as we want the CRN to do the work and not (say) the input encoder $e$. As an example, $e(x)$ could be the indicator species for the string $x$; but we want to consider other input formats, including counts. One could have a similar definition of uniform families of CRNs that compute functions. If there are no (computability) restrictions on $e$ and $f$ we say the family is nonuniform; this incredibly powerful model finds most use in proving lower bounds.

# Composability of CRNs

### Context

Suppose we wish to compose two computing CRNs whose input and output are the same format (e.g., both are Boolean, or both are real numbers), so that the output of an upstream CRN $\mathcal{U}$ can be supplied as input to a downstream CRN $\mathcal{D}$. In contrast to more traditional models of computation, composition in CRNs is not straightforward to implement, since there is in general no way for $\mathcal{U}$ to signal that it is finished computing so that $\mathcal{D}$ can remain "off" until $\mathcal{U}$ is done.

Several tricks in different models of CRN computation have been developed for enabling composition:

• Monotone production of outputs: If $\mathcal{U}$ only produces but never consumes its output, then it may be composed with $\mathcal{D}$ unmodified. This is because $\mathcal{D}$ must already be designed to account for the fact that it may take arbitrarily long to encounter all of its input. From the point of view of $\mathcal{D}$, there is no difference between a "shy" input that takes a very long time until its first reaction, and an input that will be produced eventually by $\mathcal{U}$ but simply has not yet been produced. Also, since $\mathcal{U}$ never consumes its output, $\mathcal{D}$ will not interfere with $\mathcal{U}$ by consuming $\mathcal{U}$'s output. Nonmonotone functions such as $f(x_1,x_2) = x_1-x_2$, although computable by a CRN such as $X_1 \to Y;\ \ X_2 + Y \to \emptyset$, cannot be computed by a CRN that does not consume its output. In these cases, a common trick is to use a "dual-rail" representation, representing the output (and in the case of $\mathcal{D}$, the input) by two species $Y^+$ and $Y^-$, such that we interpret the "output" of the CRN to be the difference between the two counts $\# Y^+ - \# Y^-$. In this case, it is possible to decrease the output by producing $Y^-$, enabling monotone production of output species even for functions that are not monotone. (In the example, the reactions $X_1 \to Y^+$ and $X_2 \to Y^-$ compute the function $f(x_1,x_2) = x_1-x_2$.)
• Halting signal: If $\mathcal{U}$ is able to sense that it is done, then this gives an easy way to compose. More precisely, if there is a species $H$ that is absent initially, and $\mathcal{U}$ can be designed to produce $H$ only if the output is correct, then by letting $H$ "activate" $\mathcal{D}$ (e.g., by being a catalyst in all of $\mathcal{D}$'s reactions, or by a reaction $H \to L$, where $L$ is necessary to initiate the first reaction of $\mathcal{D}$), $\mathcal{D}$ can remain "off" until $\mathcal{U}$ is done, and they will not interfere with each other.
• Catalytic inputs: This is, in a sense, complementary to the "monotone production of outputs" technique. If $\mathcal{D}$ only uses its input species as catalysts (thus never altering their counts), then $\mathcal{D}$ cannot interfere with $\mathcal{U}$ since $\mathcal{D}$ cannot alter the species they share in common. However, this means that $\mathcal{D}$ cannot deterministically sense how much of its inputs it has, only whether they are present or absent. (It could also stochastically sense their amounts since the rate of a reaction like $X_1 + A \to X_1 + B$ would be higher if $X_1$ has higher count.) An example of this idea in action is the following generic way to create any two-input, one-output Boolean gate with four reactions (the reactions essentially give the truth table of the gate). For example, to compute AND (call the input wires $x$ and $y$ and the output wire $z$): $$0_x + 0_y + 1_z \to 0_x + 0_y + 0_z \\ 0_x + 1_y + 1_z \to 0_x + 1_y + 0_z \\ 1_x + 0_y + 1_z \to 1_x + 0_y + 0_z \\ 1_x + 1_y + 0_z \to 1_x + 1_y + 1_z$$ In other words, if two inputs encounter the wrong output, they catalytically convert it to the correct output.

### Problem statement

Give a general construction to make a CRN computing a function composable. The details of what that means precisely will depend on the precise definition of "a CRN computing a function".

# Reachability in real-valued CRNs

### Context

Assume the CRN has $k$ species, so that a state of the CRN is a vector $\vec{c} \in \mathbb{R}^k_{\geq 0}$.

In the discrete model of CRNs, there is a clear and incontestable definition of what it means for one state $\vec{d} \in \mathbb{N}^k$ to be reachable from another state $\vec{c} \in \mathbb{N}^k$: there is a finite sequence of states $\vec{c} = \vec{c}_0,\vec{c}_1,\ldots,\vec{c}_{n-1},\vec{c}_n = \vec{d}$ such that each $\vec{c}_i$ results from applying a single reaction to state $\vec{c}_{i-1}$ (and that reaction is applicable: all the reactants are present in sufficient count to execute the reaction). The definition of reachability forms the basis for several results and other definitions in discrete CRNs. In particular, it captures the notion that it is possible for the CRN to go from state $\vec{c}$ to state $\vec{d}$, even though it may be unlikely to do so.

In the real-valued model, it is not clear what it means for one state to be reachable from another. The real-valued model is often called the "deterministic" model, but this is typically when mass-action kinetics are used (which, being described by differential equations with a unique solution, are inherently deterministic). However, we would still like to describe what it might mean to claim that the CRN could (if reactions happened at some rates other than that described by mass-action kinetics) reach from state $\vec{c}$ to state $\vec{d}$, while obeying the intuitive constraint that it should be impossible for some reaction to occur with positive rate in any state in which some of its reactants have concentration 0. Another goal is that it should contain known rate laws as special cases: if some rate law says that the CRN will reach from state $\vec{c}$ to state $\vec{d}$, then the generalized definition should agree that $\vec{d}$ is reachable from $\vec{c}$.

Here is one reasonable looking definition. We say that $\vec{d}$ is curve-reachable from $\vec{c}$ if there is a continuous curve in concentration space (a continuous function $\rho:[0,1] \to \mathbb{R}^k_{\geq 0}$) such that

1. $\rho$ is differentiable (i.e., there is a well-defined tangent vector at all points along the curve).
2. For each $t \in [0,1]$ (which we refer to as "time" $t$, although this termininology is taken from the vocabulary of parameterized curves and is not intended to describe "real time", as this is not a kinetic model), $\rho'(t)$ (the derivative of $\rho$ at time $t$, which is the tangent vector at time $t$) is expressible as $\rho'(t) = \sum_{i=1}^\ell f_i(t) \vec{v}_i$, where $\vec{v}_i$ is the $i$'t reaction vector (the product vector minus the reactant vector for the $i$'th reaction), and $f_i(t) \geq 0$ is a nonnegative real number (the "instantaneous flux" through the $i$'th reaction at time $t$). In other words, at every point along the curve, the motion in the direction of the curve is interpretable as "run the reactions at some nonnegative rates".
3. If $f_i(t) > 0$, then for all reactants $X$ of the $i$'th reaction, $\rho(t)(X) > 0$. In other words, if the flux through the $i$'th reaction is strictly positive at time $t$, then all reactants of that reaction must actually be present at time $t$.

This definition is appealing because all known rate laws, such as mass-action, Michaelis-Menten, and Hill function kinetics, trace out curves that obey the stated properties. However, the definition is not reasonable, as the following example shows.

Suppose we have a single species $X$, an empty initial state $\emptyset$, and the reaction $X \to 2X$. Intuitively, the reaction should mean that we need some $X$ to produce more $X$, and since we start with no $X$, there should be no states reachable from $\emptyset$ except $\emptyset$ itself. However, consider the curve $\rho:[0,1] \to \mathbb{R}$ defined by $\rho(t) = t^2$. It actually obeys all three properties, yet it testifies that the state with concentration 1 of $X$ is curve-reachable from $\emptyset$.

We can state more precisely what is wrong with the definition. Define a siphon to be a subset $\Omega$ of species such that every reaction with an element of $\Omega$ as a product also has an element of $\Omega$ (possibly a different one) as a reactant. Intuitively, "to produce something in $\Omega$ you need something in $\Omega$". It may seem "obvious" that this is equivalent to stating that "the absence of $\Omega$ is forward-invariant": if state $\vec{c}$ has concentration 0 of all species in $\Omega$, then every state $\vec{d}$ reachable from $\vec{c}$ also has concentration 0 of all species in $\Omega$. However, in the case of mass-action reachability, this implication is a deep theorem that requires a sophisticated proof. And in the case of curve-reachability, it is false. In fact, it is reasonable to say that the failure to enforce the "forward-invariant absence of siphons" is precisely what is wrong with the definition of curve-reachability. Property 3 was intended to enforce this condition, but it failed.

### Problem statement

Give a definition of reachability in real-valued CRNs based on continuous curves in $\mathbb{R}^k$, such that curves traced out by all known rate laws are special cases of it, and such that it respects the forward-invariant absence of siphons, as well as the intuitive idea that "motion along the curve should be interpretable as each reaction happening at some nonegative rate".

There is a definition of reachability in [5]. It is based on finite numbers of straight-line segments. It obeys the "forward-invariant absence of siphons" constraint, and it also enforces that each straight line is interpretable as running some reactions. However, finite unions of straight line segments cannot describe smooth curves such as those produced by mass-action kinetics. We did prove that this notion of reachability is more general than mass-action: if $\vec{d}$ is mass-action reachable from $\vec{c}$ (under some setting of rate constants, and $\vec{d}$ could be reached either in finite time, or in the limit as time goes to infinity), then $\vec{d}$ is reachable from $\vec{c}$ via a finite number of straight-line segments under our definition. But it would be more elegant if the actual curves used in the definition had the curves traced out by mass-action and other rate laws as special cases (which would immediately imply that reachability under those rate laws implies reachability under the more generalized notion).

# Is stable computation decidable?

### Context

In order for a CRN to have a deterministic output (for given input), the CRN should stably compute. The question is whether this property is decidable for CRNs.

### Problem overview

See the definition of stably computes from problem "CRNs and circuit complexity" above. The question is this: Is it decidable whether or not a given CRN stably computes?

So, is it decidable whether for any configuration $\vec{c}$ reachable from an initial configuration, there is a configuration $\vec{o}$ reachable from $\vec{c}$ such that:

• configuration $\vec{o}$ has a $Y$ (yes) species present and no $N$ (no) species present, or vice versa, and
• this property holds for all configurations reachable from $\vec{o}$?

# Experimentally motivated questions about CRNs and CRN-to-DNA technologies

### Context

One can write down “abstract” or "formal” CRNs which are interesting in many ways, such as: exhibiting complex temporal dynamics under the mass action model (oscillations, chaos, ...) or performing computation with counts under the stochastic model. Can we implement any interesting CRN we can write down in a real test tube with real molecules?

David Soloveichik et al. (http://www.pnas.org/content/early/2010/02/25/0909380107.abstract) showed that, in theory, it is possible to use DNA strand displacement to approximate the dynamics of any arbitrary formal CRN, with arbitrary accuracy, up to scaling rate constants (and in certain concentration regimes.) Luca Cardelli proposed another “CRN to DNA” compilation scheme using DNA strand displacement with “nicked double stranded DNA” complexes (http://journals.cambridge.org/action/displayAbstract?aid=8851390).

This inspired some of us (Niranjan Srinivas, David Soloveichik, and Erik Winfree at Caltech, Yuan-Jyue Chen and Georg Seelig at UW, and Neil Dalchau, Andrew Phillips and Luca Cardelli at MSR) to work on actually engineering complex chemical reaction networks in the “wet” lab - in order to understand the kinds of “real world” problems or issues we would need to solve to really get this technology to work. Recently our efforts resulted in some important breakthroughs: Yuan-Jyue Chen et al. (http://www.nature.com/nnano/journal/v8/n10/abs/nnano.2013.189.html) and Srinivas et al (in progress).

So here I attempt to outline some of the questions about CRNs and CRN-to-DNA technologies that are directly experimentally motivated based on our experience over the last few years.

### Questions

High level questions are labeled HQ; mid-level questions, MQ.

HQ1. Is DNA strand displacement a scalable technology for implementing chemical reaction networks?

Based on our experience, to reduce cross talk between different intended strand displacement steps and to reduce spurious “leak” reactions, it is good practice to make toeholds as “orthogonal” to each other as possible. By this, I mean that, for every pair of toeholds $s$ and $t$ in our deisgn, we would like to reduce the base-complementarity / binding energy between $s$ and $t^*$, and $t$ and $s^*$, as much as we can.

The longer we allow toeholds to be, the more sequences we have to work with: for a given length $n$, the space of possible toeholds grows roughly as $4^n$. And therefore, the larger $n$ is, the larger the set of orthogonal toeholds we could design.

However, to do anything interesting, we seem to need reversible toehold-exchange (as opposed to pure strand displacement), which requires a “spontaneous” or “thermal” toehold dissociation step. A 7 nucleotide toehold dissociates at roughly $1 \ s^{-1}$ at 25 C, and every additional nucleotide we add in length contributes roughly a factor of 10 to slow down this rate. That is, an 8-nucleotide toehold will dissociate once every _ten_ seconds, and so on. Therefore, we seem to face a fundamental limit on the lengths of our toeholds, if we want our experiments or computations to finish in a reasonable time. (Increasing the temperature can change this limit a little bit, but at 25 C = 298 K we are already a long way away from 0, so I’m not sure its going to buy us much more.)

So, given this limit on toehold length, and given that we can only use a fraction of the sequence space available to us as “orthogonal toeholds” (how this fraction scales with length, within the limit, seems important), do we have the ability to build a chemical reaction network large enough to be interesting or useful? If the answer is no, strand displacement networks based on toehold exchange seem to be, at least naively, not a scalable technology for engineering chemical reaction networks.

• MQ1. In the “well-mixed” test tube setting, any two molecules in the tube can collide with each other. If some molecules are complexes that are in high concentration and have free toeholds (which are true for both CRN to DNA schemes proposed so far), these complexes could either temporarily bind each other or other molecules in the system, even if no strand displacement between those molecules was intended. So, toeholds could get temporarily “occluded” and this could reduce the rates of intended reactions. This problem is likely to get worse on scaling, since the number of pairwise interactions is quadratic in the number of species. Is this effect significant enough to potentially affect scaling?
• MQ2. Recently, there has been much interest in implementing chemical reactions on a surface (typically DNA origami, or multiple origamis bound to form a larger platform). A pertinent point in favor of this approach is that spatial isolation could potentially allow us to re-use toeholds, since spatial separation could reduce both spurious interactions and temporary occlusion. Is CRNs-on-surface a fundamentally more scalable approach? Can we back this up (or argue against) this idea with a rigorous calculation and compare it to the well mixed setting (HQ1, MQ1). Necessary caveat: Based on our recent study of the biophysics and kinetics of strand displacement (http://nar.oxfordjournals.org/content/early/2013/09/07/nar.gkt801.short), we suspect that the surface itself, and the fact that the invading strand and displacement complex would be tethered to the surface, could possibly affect the biophysics/kinetics of strand displacement on a surface.

HQ2. How can we design or program “robust” or “fault tolerant” CRNs?

• MQ1. Experimentally, we find that not all our molecules are perfect - some may have synthesis errors - and therefore our implementations are not perfect. So, if we set out to implement the reaction $X -> Y$, it is possible that we really are implementing $X \rightarrow (1-\delta) Y$, for some $\delta$ that depends on the extent of synthesis errors. Alternatively this could be viewed as “every so often, we consume the reactant(s) without releasing the product(s)”. Could we build CRNs that are somehow robust to errors of this kind?
• MQ2. Strand displacement rate can be controlled over 5 to 6 orders of magnitude based on toehold strength, but “blunt end” or “zero toehold” strand displacement still happens at a non-zero rate. This means, from an experimental standpoint, that there are no “pure” implementations of any reactions. In particular, if one wants to implement a bimolecular reaction $A + B \rightarrow C$ with rate constant $k_1$, one is really implementing that reaction *and* another, which looks like $B \rightarrow C$ with rate constant $k_2$. Ideally, $k_2 << k_1$, but getting this to work in practice can be tricky. Can we build CRNs robust to this kind of more difficult error?

# References

1. 1.0 1.1 1.2 Soloveichik, D., Cook, M., Winfree, E., & Bruck, J. (2008). Computation with finite stochastic chemical reaction networks. Natural Computing, 7(4), 615-633.
2. 2.0 2.1 2.2 Qian, L., Soloveichik, D., & Winfree, E. (2011). Efficient Turing-universal computation with DNA polymers. In DNA computing and molecular programming (pp. 123-140). Springer Berlin Heidelberg.
3. Thachuk, C., & Condon, A. (2012). Space and energy efficient computation with DNA strand displacement systems. In DNA Computing and Molecular Programming (pp. 135-149). Springer Berlin Heidelberg.
4. 4.0 4.1 Ho-Lin Chen, David Doty, David Soloveichik (2013). Deterministic function computation with chemical reaction networks. Natural Computing 2013
5. Ho-Lin Chen, David Doty, David Soloveichik (2014). Rate-independent computation with continuous chemical reaction networks. Innovations in Theoretical Computer Science, 2014