Safe Haskell | Safe-Inferred |
---|---|
Language | Haskell2010 |
Iterators
Synopsis
- closeAllIterators ∷ IOLike m ⇒ ChainDbEnv m blk → m ()
- stream ∷ ∀ m blk b. (IOLike m, HasHeader blk, HasCallStack) ⇒ ChainDbHandle m blk → ResourceRegistry m → BlockComponent blk b → StreamFrom blk → StreamTo blk → m (Either (UnknownRange blk) (Iterator m blk b))
- data IteratorEnv m blk = IteratorEnv {
- itImmutableDB ∷ ImmutableDB m blk
- itVolatileDB ∷ VolatileDB m blk
- itIterators ∷ StrictTVar m (Map IteratorKey (m ()))
- itNextIteratorKey ∷ StrictTVar m IteratorKey
- itTracer ∷ Tracer m (TraceIteratorEvent blk)
- newIterator ∷ ∀ m blk b. (IOLike m, HasHeader blk, HasCallStack) ⇒ IteratorEnv m blk → (∀ r. (IteratorEnv m blk → m r) → m r) → ResourceRegistry m → BlockComponent blk b → StreamFrom blk → StreamTo blk → m (Either (UnknownRange blk) (Iterator m blk b))
Documentation
closeAllIterators ∷ IOLike m ⇒ ChainDbEnv m blk → m () Source #
Close all open Iterator
s.
This can be called when the ChainDB is already closed.
stream ∷ ∀ m blk b. (IOLike m, HasHeader blk, HasCallStack) ⇒ ChainDbHandle m blk → ResourceRegistry m → BlockComponent blk b → StreamFrom blk → StreamTo blk → m (Either (UnknownRange blk) (Iterator m blk b)) Source #
Stream blocks
Start & end point
The start point can either be in the ImmutableDB (on our chain) or in the VolatileDB (on our chain or on a recent fork). We first check whether it is in the VolatileDB, if not, we check if it is in the ImmutableDB (see "Garbage collection" for why this order is important). Similarly for the end point.
If a bound can't be found in the ChainDB, an UnknownRange
error is
returned.
When the bounds are nonsensical, e.g.,
> StreamFromExclusive (Point (SlotNo 3) _)
> StreamToInclusive (RealPoint (SlotNo 3) _)
An InvalidIteratorRange
exception is thrown.
Paths of blocks
To stream blocks from the ImmutableDB we can simply use the iterators offered by the ImmutableDB.
To stream blocks from the VolatileDB we have to construct a path of points
backwards through the VolatileDB, starting from the end point using
getPredecessor
until we get to the start point, genesis, or we get to a
block that is not in the VolatileDB. Then, for each point in the path, we
can ask the VolatileDB for the corresponding block.
If the path through the VolatileDB is incomplete, we will first have to
stream blocks from the ImmutableDB and then switch to the path through the
VolatileDB. We only allow the tip of the ImmutableDB to be the switchover
point between the two DBs. In other words, the incomplete path through the
VolatileDB must fit onto the tip of the ImmutableDB. This must be true at
the time of initialising the iterator, but does not have to hold during the
whole lifetime of the iterator. If it doesn't fit on it, it means the path
forked off more than k
blocks in the past and blocks belonging to it are
more likely to go missing because of garbage-collection (see the next
paragraph). In that case, we return ForkTooOld
.
Garbage collection
We have to be careful about the following: as our chain grows, blocks from our chain will be copied to the ImmutableDB in the background. After a while, old blocks will be garbage-collected from the VolatileDB. Blocks that were part of the current chain will be in the ImmutableDB, but blocks that only lived on forks will be gone forever.
This means that blocks that were part of the VolatileDB when the iterator was initialised might no longer be part of the VolatileDB when we come to the point that the iterator will try to read them. When this is noticed, we will try to open an iterator from the ImmutableDB to obtain the blocks that have moved over. However, this will only work if they were and are part of the current chain, otherwise they will have been deleted from the VolatileDB without being copied to the ImmutableDB.
This iterator is opened with an open upper bound and will be used to stream
blocks until the path has been fully streamed, the iterator is exhausted,
or a block doesn't match the expected point. In the latter two cases, we
switch back to the VolatileDB. If the block is missing from the VolatileDB,
we will switch back to streaming from the ImmutableDB. If that fails, we
switch back to the VolatileDB. To avoid eternally switching between the two
DBs, we only switch back to the VolatileDB if the stream from the
ImmutableDB has made progress, i.e. streamed at least one block with the
expected point. If no block was streamed from the ImmutableDB, not even the
first one, we know for sure that that block isn't part of the VolatileDB
(the reason we switch to the ImmutableDB) and isn't part of the ImmutableDB
(no block was streamed). In that case, we return IteratorBlockGCed
and
stop the stream.
Note that the open upper bound doesn't allow us to include blocks in the stream that are copied to the ImmutableDB after opening this iterator, as the bound of the iterator is fixed upon initialisation. These newly added blocks will be included in the stream because we will repeatedly open new ImmutableDB iterators (as long as we make progress).
Bounds checking
The VolatileDB is hash-based instead of point-based. While the bounds of a stream are points, we can simply check whether the hashes of the bounds match the hashes stored in the points.
The ImmutableDB is slot-based instead of point-based, which means that
before we know whether a block in the ImmutableDB matches a given point, we
must first read the block's hash corresponding to the point's slot from the
(cached) on-disk indices, after which we can then verify whether it matches
the hash of the point. This is important for the start and end bounds (both
points) of a stream in case they are in the ImmutableDB (i.e., their slots
are <= the tip of the ImmutableDB): we must first read the hashes
corresponding to the bounds from the (cached) on-disk indices to be sure
the range is valid. Note that these reads happen before the first call to
iteratorNext
.
Note that when streaming to an exclusive bound, the block corresponding
to that bound (Point
) must exist in the ChainDB.
The ImmutableDB will keep the on-disk indices of a chunk of blocks in memory after the first read so that the next lookup doesn't have to read from disk. When both bounds are in the same chunk, which will typically be the case, only checking the first bound will require disk reads, the second will be cached.
Costs
Opening an iterator has some costs:
- When blocks have to be streamed from the ImmutableDB: as discussed in "Bounds checking", the hashes corresponding to the bounds have to be read from the (cached) on-disk indices.
- When blocks have to be streamed both from the ImmutableDB and the VolatileDB, only the hash of the block corresponding to the lower bound will have to be read from the ImmutableDB upfront, as described in the previous bullet point. Note that the hash of the block corresponding to the upper bound does not have to be read from disk, since it will be in the VolatileDB, which means that we know its hash already from the in-memory index.
In summary:
- Only streaming from the VolatileDB: 0 (cached) reads from disk upfront.
- Only streaming from the ImmutableDB: 2 (cached) reads from disk upfront.
- Streaming from both the ImmutableDB and the VolatileDB: 1 (cached) read from disk upfront.
Additionally, when we notice during streaming that a block is no longer in the VolatileDB, we try to see whether it can be streamed from the ImmutableDB instead. Opening such an iterator costs 2 (cached) reads from disk upfront. This can happen multiple times.
Exported for testing purposes
data IteratorEnv m blk Source #
Environment containing everything needed to implement iterators.
The main purpose of bundling these things in a separate record is to make it easier to test this code: no need to set up a whole ChainDB, just provide this record.
IteratorEnv | |
|
∷ ∀ m blk b. (IOLike m, HasHeader blk, HasCallStack) | |
⇒ IteratorEnv m blk | |
→ (∀ r. (IteratorEnv m blk → m r) → m r) | Function with which the operations on the returned iterator should
obtain their |
→ ResourceRegistry m | |
→ BlockComponent blk b | |
→ StreamFrom blk | |
→ StreamTo blk | |
→ m (Either (UnknownRange blk) (Iterator m blk b)) |
See stream
.