Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 32 tok/s
GPT-5 High 40 tok/s Pro
GPT-4o 83 tok/s
GPT OSS 120B 467 tok/s Pro
Kimi K2 197 tok/s Pro
2000 character limit reached

Paged Optimizers in Microcontroller Systems

Updated 18 August 2025
  • Paged optimizers are algorithmic frameworks that group frequently interacting functions on the same memory page to drastically reduce page selection instructions in constrained microcontrollers.
  • They employ a three-step heuristic involving data-flow analysis, weighted function relation graph construction, and greedy graph partitioning to effectively manage function placement.
  • Experimental results in real-world embedded applications show a 13.2% reduction in code size, directly lowering ROM consumption and enhancing system efficiency.

Paged optimizers are algorithmic frameworks and heuristics devised to efficiently assign program functions to memory pages in constrained microcontroller architectures, thereby reducing the insertion and execution overhead of page selection instructions (PSIs). Their design specifically targets 8-bit microcontrollers where the instruction set and address bus width severely limit addressable memory, necessitating a page selection register (PSR) and explicit PSI management. The central goal of a paged optimizer is to minimize global code size and execution overhead by intelligently co-locating tightly interacting functions in the same memory page.

1. Motivation and Architectural Context

In low-resource embedded microcontroller systems, code memory is segmented into multiple pages, each uniquely addressed via a page selection register (PSR). A change in control flow between instructions residing in different memory pages requires explicit page selection instructions. These PSIs consume instruction slots, increase code size, and degrade execution efficiency. Therefore, paged optimizers aim to minimize the frequency and placement of PSIs by exploiting the structure of control flow—most notably, by grouping caller–callee pairs and frequently interacted functions within the same page.

The memory architecture demands that for every function call or unconditional jump (“goto”), the PSR holds the correct page number. If not, an additional PSI must be issued, incurring overhead.

2. Heuristic Algorithm Structure

The optimizer employs a three-step heuristic algorithm to achieve page-efficient function placement:

A. Analysis Process: Data-flow Tracking of PSR Influence

The algorithm first computes the set relationships between program control blocks and PSR influence using basic data-flow analysis. Each basic block bb is characterized by two sets:

  • Generation set:

$\text{Gen}(b) = \begin{cases} \text{RetVop}(i) & \text{if %%%%0%%%% is a PNTB and %%%%1%%%% is its last PNTI} \ \varnothing & \text{if %%%%2%%%% is PTB} \end{cases}$

  • Kill set:

$\text{Kill}(b) = \begin{cases} \{ f \mid f \in \text{RetVop}(i) \} & \text{if %%%%3%%%% is PNTB} \ \varnothing & \text{if %%%%4%%%% is PTB} \end{cases}$

In/out sets for each block are propagated via: In(bi)=bjpred(bi)Out(bj)\text{In}(b_i) = \bigcup_{b_j \in \text{pred}(b_i)} \text{Out}(b_j)

Out(bi)=In(bi)Kill(bi)\text{Out}(b_i) = \text{In}(b_i) \setminus \text{Kill}(b_i)

This analysis precisely determines the PSR-related state at every program point, using a function RetVop that infers the effective PSR after a call or goto.

B. Building Process: Weighted Function Relation Graph (FRG) Construction

Using the results of the analysis, the optimizer constructs a weighted Function Relation Graph:

  • Each node is a program function.
  • Each edge between functions gg and hh has an initial weight zero.

During program scan, every page nontransparent instruction (PNTI) updates edge weights as follows: w(g,h)w(g,h)+PreValueVOP(i1)w(g, h) \leftarrow w(g, h) + \frac{\text{PreValue}}{|\text{VOP}(i-1)|} where PreValue reflects the cost saved if both gg and hh are placed together. This encodes the savings potential if call/goto-induced PSR switches can be avoided by assignment to the same page.

C. Partitioning Process: Greedy Graph Partitioning

The algorithm partitions the FRG’s nodes into pages while adhering to:

  • Each function assigned to one page.
  • Page count and size are hardware-constrained.
  • Functions with high edge weight are grouped, realizing savings by setting their edge weight to zero.

The procedure is:

  1. Sort functions by descending size.
  2. Sequentially assign functions to the page that yields the maximum decrease in total expected PSI cost.
  3. Update edge weights and statistics at each insertion, reflecting the realized savings.

This greedy strategy exploits locality of interaction and function size to cluster “hot” caller-callee chains, minimizing uncoalesced transitions.

3. Overhead Reduction Mechanism

Page grouping capitalizes on the fact that inter-function jumps and calls are the principal sources of PSR switches:

  • Fewer inter-page control transitions yield fewer inserted PSIs.
  • When “affinity-linked” functions share a page, the required value is often already present in the PSR, eliminating further PSI insertion.

The optimizer’s effectiveness is quantified as: Ctotaledges (i,j)  not groupedw(i,j)C_{\text{total}} \approx \sum_{\text{edges } (i,j) \;\text{not grouped}} w(i,j) where the sum aggregates unsaved PSI costs resulting from functions partitioned into separate pages.

4. Experimental Performance

In a suite of 21 real-world embedded applications compiled with HICC for HR6P microcontrollers, the heuristic paged optimizer reduced aggregate code size by 13.2% (i.e., final code size was 86.8% of baseline). This reduction is highly significant for embedded system design, as it:

  • Directly lowers ROM consumption,
  • Potentially reduces battery usage (shorter execution paths),
  • Enables the deployment of less expensive microcontrollers with smaller memory footprints.

5. Implementation Characteristics and Trade-offs

Paged optimizers based on this heuristic are characterized by:

  • NP-hardness of optimal partitioning; heuristic greedy assignment yields practical, cost-effective results.
  • The static analysis required is well-understood and efficiently implementable using classical data-flow techniques.
  • Partitioning quality depends on the function interaction graph; denser graphs admit higher PSI savings.

Trade-offs include:

  • In complex control graphs, greedy partitioning may locally optimize but globally miss better assignments.
  • Incremental recompilation is possible, but global graph changes may necessitate full recomputation.
  • For extreme code density or frequently changing application structure, additional refinement heuristics may be advisable.

6. Significance and Application Scope

This optimization strategy is fundamental for memory-constrained microcontrollers and embedded systems, particularly where:

  • Page size and count are strictly bounded,
  • Function layout interacts non-trivially with control flow,
  • Code size and instruction count translate into energy, memory, and cost savings.

The work demonstrates that intelligent page assignment—grounded in static control flow analysis and weighted function grouping—can yield substantial reductions in program size for resource-limited architectures. The methodology is directly applicable to modern compilers for microcontrollers, binary layout optimizers, and systems requiring fine-grained control of instruction placement for performance-critical or energy-sensitive applications.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube