Introduction

What we are looking for is an offline algorithm to construct a data structure so that online search time is quick. There are no time restrictions on the offline algorithm, but the online search is in an embedded platform and there there are time constraints. In epic terms, if the online algorithm is too slow there can be a loss of life and a rather expensive airplane.

The definitions are not rigorous. This web page is meant to be read by interested parties, particularly technically savvy parties. There is a greater degree of technical content contained in the actual code.

The algorithm is not specific to the domain described on this page. It is suitable for all like problems. It’s going to take some time to get to the algorithm, so be patient.

This problem first came to my attention during work on a Radar Warning Receiver (RWR). An RWR is a device put on a fighter plane to detect whether an incoming radio signal (EMI) is from a danger to the pilot or just a cell phone. It is responsible for analyzing all radio signals and finding things of reportable interest and ignoring the rest. In a word, if it’s a missile the pilot should be informed, if it’s a cell phone, ignore it.

The incoming signal has a number of discoverable parameters. For example, the pulse width (PW), the pulse recurrence interval (PRI), the signal polarization, the frequency and so on. The algorithm discusses how to organize data to do quick searches.

Modern radars, and cell phones too for that matter, can do a number of things to make the RF signal difficult to identify. The radar can change the signal frequency within a given radar specific range, or change it’s Pulse Recurrence Interval (PRI), Pulse Width (PW), amplitude, polarization and other parameters. Since a number of radars can have overlapping characteristics, this adds complexity to the process of detection. We will only consider frequency (f) changes within a given range of frequencies. This means that a radar can emit a frequency, f which is in a continuous range [flow, fhigh]. This range constitutes the quantization spectrum. However, the analyses applies to all other radar parameters (or other objects) which have a continuou ranges of values rather than discrete values. We consider parameters with a discrete number of states as enumerable in the sense that a unique number can be assigned to a state.

Notation

The question then is, given an input frequency f how do we associate this frequency to a given set of radars? Let’s look at a simple way of doing this search. Suppose our search algorithm searches for all ranges in which f is found, and returns the set of radars associated with these ranges. The search would look something like:

for all ranges {
   if (fi lowffi high) then save the radar name;
}

Let’s put some numbers in and see what we get. Suppose there are 4096 (212) radar ranges. Then the ‘for’ loop must execute 4096 times to find all ranges that f can participate in. It can’t just stop at the first one found, because the ranges overlap. Let’s take a little look at what this overlap could look like.

  o----------------------------------o      o------o
  |                                  |      |      |
  |      o-------------o             |      |      |
  |      |             |             |      |      |
  |      |      o------|------o      |      |      |
  |      |      |      |      |      |      |      |
  |  R1  |  R2   |  R3  |  R4  |  R5  |   R6  |  R7  |
  |      |      |      |      |      |      |      |
  o------o------o------o------o------o------o------o
  |      |      |      |      |      |      |      |
  f1low   f2low   f3low   f2high  f3high  f1high  f4low   f4high
  
                Figure 1

where

  • R      is the regions between frequencies
  • filow  is the low end of a freguency range.
  • fihigh is the high end of a freguency range.

To make life easier, let’s use some terminology.

  • [ ] stands for a frequency interval [flow, fhigh]. All frequency intervals are closed, that is, both flow and fhigh are contained in the interval.
  • [ ) stands for a range [flow, fhigh) with the right side ‘open’, that is, flow is contained in the interval but fhigh is not.
  • F = [flow, fhigh]
  • R = [flow, fhigh)

And some names:
Frequency Intervals Figure 1

  • F1 = [f1low, f1high]
  • F2 = [f2low, f2high]
  • F3 = [f3low, f3high]
  • F4 = [f4low, f4high]

Ranges

  • R1 = [f1low,  f2low   )
  • R2 = [f2low,  f3low  )
  • R3 = [f3low,  f2high )
  • R4 = [f2high, f3high )
  • R5 = [f3high, f1high )
  • R6 = [f1high, f4low  )
  • R7 = [f4low,  f4high )

We notice:
   F1 overlaps F2 and F3
   F2 overlaps F3
   F4 has no overlap
   Ri are all disjoint

Work Function

Before we go back to the algorithm, let’s look at the pragmatics. The RWR is embedded in a computer in a plane. The computer has other things to do than to search for a radar. So what the computer does is periodically start and then interrupt the execution of the radar search function. This time period is called a time slice. If the search function has not completed its job, well then too bad. The pilot is not notified that he will shortly be shot down and we can be assured that both plane and pilot are lost.

How much work is done. Well, on a calm day the search algorithm may be presented with 1024 (210) frequencies per time slice. And now let’s look at the algorithm and the figure (above). When do we stop searching? Never. In order to capture all the radars that contain the input f we have to search the entire data base of records. That works out to about 210 * 212 = 222 = 4,194,304 probes / time slice, with each probe comparing f against the radar’s interval, F = [flow, fhigh]. The total is then 222 probe.

It is possible to convert each probe into a work function. With each probe, the software does two comparisons, it checks whether the input frequency, f is less than flow and whether the input frequency, f is greater than fhigh. We continue the search if either of these conditions are satisfied and stop the search if both conditions fail, that is, flow <= f <= fhigh. In this case, the actual work performed is approximately 1½ the number of probes (probes fail on the first check ½ the time and on the second check ½ the time).

If we consider the above fomulation a rejection test then we can reforumlate the comparisons as an inclusion test, if f is >= flow and f <= fhigh then the search is successful and the work function remains the same.

In an actual RWR there are more parameters than just the frequency, f. In this case it is possible to terminate our search early, and we would expect that on average our search would terminate after 2048 probes. A considerable improvement over the above with a total work function of about 221 tests. This assumes that all input is part of the frequency ranges stored. But, for any frequencies higher than the highest frequency, the search fails after the entire database of 212 entries are searched. Failure costs.

Algorithm

When I first saw this I said that there must be a way to optimize the search, to determine a ‘best’ way to search, and some criteria for search termination. The Gold Standard for searching is a binary search (although there is one better option – hashing – which is good on paper). No matter how I addressed the problem, the issue of an F1 keeps popping up, a frequency interval which spans a large portion of other intervals. Putting it in tangible terms, any f in the interval F1 may also include F2 and F3. Sorting won’t optimize finding the frequency intervals containing f.

But if we look at the ranges, R, we note that they can be sorted into a monotonically increasing set of disjoint intervals. Further, the end point of every interval is the same as the start point of the succeeding interval. Now this is interesting. If all ranges are sorted and flow removed from all ranges except the first one, we get a sorted list with unique entries that a binary search can be performed on. Now here’s an interesting thing about a binary search. For N entries it only takes log2N probes to find anything. In our case this means that instead of doing 212 tests for each input f, we only need at most 12 probes with an average of about 6 probes. As an added bonus the work function is reduced. Instead of 1½ times the number of probes, we get 1 times the number of probes. One check is needed to determine whether an input f is in a range, Ri.

So at least there is a solution to the problem. Instead of focusing on frequency intervals, [flow, fhigh] we focus on ranges R, [flow, fhigh). The algorithm generates a set of ranges for a given input set of frequency intervals and populates each member of the set with the emitters whose frequency range is covered by R.

Brute Force Spectra Quantization

Let’s look at an obvious way to extract ranges. Suppose we have generated a set of ranges, { R }, from a set of frequency intervals, { F }. How do we expand { R } given a new F. We search the set of Rs, { R }, to find the range containing flow and split this R into two parts. One part, call it Rlow, that contains all frequencies less than flow and Rhigh containing all frequencies greater or equal to flow. Then we repeat this for fhigh. Each time we get a new frequency interval, F, we can get up to 2 new Rs. A simple procedure with only one or two defects.

Let’s look at the mechanics of this algorithm and the upper bound on the number of steps performed. We’ll take two possible implementations.

  1. Array: Use an array for the holder of the set { R }. The search are for F in { R } is binary for each of flow and fhigh, which gives us an upper bound of 2| F | log2|{ R }|, that is 2 times the number of frequency intervals times the total number of ranges plus 2 |{ R }|2, to account for inserting new ranges into the existing array of ranges. The cost is small for searches and large for insertions.
  2. List: Use a list for the holder of set { R }. The searches are linear, which gives us an upper bound of 2| F | |{ R }||/2, that is 2 times the number of frequency intervals times the total number of ranges divided by the average search. Add 2| F | to account for inserting 2 new ranges into the existing list of ranges. The cost is large for searches and small for insertions.

The results show an upper bound approximation of the number of steps required to perform the operations. There is no accounting for the time it takes for any step, and therefor there is no indication of how long it will take to construct the final set of ranges. Not considered above but essential to calculating the bounds, is treatment of the set of emitters within each range. Each range, R, has a set of emitter names which have a subset of frequencies of F in R. When a range is split the entire set of emitters is copied to the new range and the single emitter associated with the frequency interval being used is inserted into the one of the ranges. This happens twice, once for flow and once for fhigh. There is a cost to this and that cost can be quite large in both algorithms above. It looks like it will take a long time.

Optimal Spectra Quantization

The algorithm actually used takes a different approach. Instead of brute force, it considers the relationships that Fi has with Fi+1, see Figure 1, and decides what to do. Basically there are two passes. During the first pass the frequency bounds, flow and fhigh, with the associated emitter is saved, yielding a file size of 2 |{ F }|. During the second pass the file is sorted (approximately 2 |{ F }| log2(2 |{ F }|) steps) and then linearly traversed (2 |{ F }| steps). Depending on the relationship between two elements, a new R is created, or the input is stacked or the stack is emptied. This takes approximately 1 step (for creating a new R or stacking the input) or some σ |{ F }| where σ is unknown but quite small. The upper bound is 3 |{ F }| + 2 |{ F }| log2(2 |{ F }|) steps. Linear traversals plus a logarithmic sort.

A complete description of the algorithm is presented in cIdealPartition.cpp in the code. The embedded system uses the set of ranges created, { R }, and since this is a sorted list, can do a binary search on input for any given f. In our case this reduces the search steps from 212 to 12.

Lets take just one more little look at what we can do. The algorithm yields ideal partitions in that there is no smaller or larger partition which can be used to hold all of the emitter names, and the frequencies in the range are all wholly contained in the frequency interval of each emitter. A smaller partition yields duplicates (emitter names are replicated) and a larger partition yields emitters where only a part of the frequency interval for a given emitter is included. The ranges, R, are variable sized but ideal.

Non-ideal Quantization

Suppose that we use fixed sized ranges. What happens? Well, there are two impacts of note. For one, instead of a frequency sub-interval being entirely contained in a range for each emitter, we find that there are cases where only a part of a frequency sub-interval is contained. This leads to an increase in the number of emitters in a given fixed size range, and leads to some ambiguity. We know that for some emitters a frequency sub-interval is not wholly contained in an R, hence, an input f may not be in all frequency sub-intervals contained in R. The second impact is that there is no search. The range index, i is calculated directly by taking the integer part of dividing the input f by the fixed interval size, i = int(f/size), and Ri is the range containing f. We extract a list of emitter names, some of which have frequency sub-intervals entirely in R, and some of which are not completely contained, and search this list to find the frequency sub-intervals containing f. The search retrieves candidate emitters. The ‘search’ is application specific and may be sub-linear. In some cases other information can be used to distinguish the emitters, or to reduce the number of frequency sub-intervals to be searched.

Conclusion

And now at the end, the production code allows investigation of ideal and non-ideal ranges. Ideal ranges have a variable partition size and a binary search can be used, and non-ideal ranges have a fixed size and a little arithmetic, in lieu of a search, is needed.

Signatures

Quanization of a continuous spectra is significant in changing linear searches to binary searches. But this is a rather limited context. In our emitter example, there are several continuous domains, one for frequency, (f), pulse width (p) and pulse recurrence interval (pri), as well as enumerable ranges for polarization (θ) and modulation type (m). How do we organize this information in a manner to allow a binary search?

What we do is to aggregate all terms into a Signature, and to sort this Signature into a searchable table. Signature formation consists of packing terms. The packing algorithm concatenates each term into the minimum bit-space requirement needed to represent the term. The resultant packed items are rounded up to a 16-bit word, sorted, and this becomes the runtime searchable Signature Table. Input parameters are packed into an input Signature, and this Signature is searched for in the Signature Table. The 16-bit boundary requirement is a hardware requirement to reduce search time. It is convenient to round a signature to the best size for computer hardware searching. We have chosen 16-bits. If the hardware was byte (8-bit octet) efficient, we would have chosen to round-up to the next byte.

As an example of term packing, let’s consider:

  • Polarization having 8 discrete values. 3-bits (log223) are required.
  • Modulation type having 16 discrete values. 4-bits (log224) are required.
  • Frequency having 8,196 ranges. 13-bits bits (log2213) are required.
  • Pulse Width having 1,024 ranges. 10-bits (log2210) are required.
  • Pulse Recurrence Interval having 512 ranges. 9-bits (log229) are required.

All the chosen values are arbitrary and bear no relation (that I know of) to actual ranges. But, we now know our Signature size. It is the sum of the required bits, 39-bits. Rounding up to the next word, each entry in the Signature Table is 48-bits (3 16-bit words).

On input, the input parameters are packed into an input Signature, and this Signature is searched for in the Signature Table. If the Signature is not found, then this is an unknown emitter. Otherwise the emitter is identified and can be processed.

Using our assumed Signature values, the expected number of probes required to construct an Input Signature is 1 for each discrete value (θ, m), 13 for frequency (f), 10 for Pulse Width (p), and 9 for the Pulse Recurrence Interval (pri), or 29 probes total. The expected number of probes is 18 probes. As a point of comparison, if we assume that there are 4,096 (212) emitters, then a linear search would yield a maximum probe count of 4,096 and an expected probe count of 2,048.

A 5 Aug 2021 webinar, webinar contains additional information on this topic.

Epilogue

But, alas, there is more. I gave a webinar to the Association of Old Crows, a SIGINT group. The slides for this webinar show additional options and additional statistics. It is included here for your pleasure.

The included .exe file in the zip file, can be executed on a Window x-64 computer.

The implementation has a help file which can be executed as:

> partition.exe --help