We have been busy implementing the BSA spec, and there's a few things in the
DsLSRBioObjects part that we would like your opinion on.
The biggest one concerns the Alignment interface. As anticipated by all of us
except Ewan, Alignment is tricky to implement efficiently because of
gaps. The only way to do it is to repeatedly invoke get_seq_region, and
seeing if you get back a null. Of course, an AlignmentEncoder can do this on
the server-side, but it this is still clumsy (and optional).
Our main proposal is to add an operation
IntervalList get_gaps(in AlignmentElement element, in Interval the_interval);
to the interface Alignment. It's job is to simply return all the gaps of a
particular sequence in a particular alignment. For symmetry with
get_seq_region(), the_interval is also given, thus limiting the gaps to those
that you're interested in.
typedef sequence<Interval> IntervalList;
is missing from the spec; can this be added? ]
One gap would be represented as
, hence the use of Interval. We
typedef Interval Gap;
to make the semantics clearer (elsewhere, an Interval is an existing segment;
here it denotes a missing segment). The coordinates of a gap would be those
of the original sequence; gaps of length 0 are not allowed. A gap.start == 0
would be before the first nucleotide/aminoacid; a gap.start = N is a gap
between nucleotides/aminoacids N and N+1 (so gap.start = sequence.length
would be after the last.
Another proposal is to separate Alignment into more managable pieces as
interface SimpleAlignment : CosLifeCycle::LifeCycleObject
// here, everything get_seq_region()
interface Alignment : SimpleAlignment
in AlignmentElement element,
in Interval the_interval)