Cursor for LibreOffice, Week 4-6

Refactoring into Pure State Machines, Nested Tool-Calling, and Translation into 8 Languages

After the previous week I was feeling good about the ACP integration, the research sub-agent, talk to your document, and surviving Quarzadous’s refactor. In common scenarios, the whole thing usually just…worked.

However, one day after I pushed the latest build to GitHub and the LibreOffice extension site, my most active (and very helpful) user posted that he couldn’t uninstall or reinstall the extension. That’s sure to make happy customers!!

It turned out to be a user error (of trying to install the source ZIP instead of the OXT) but I realized I wanted to set up a system to let me sleep at night knowing the extension wouldn’t break in some basic scenario. The code was organized and clean, but over time the complexity of the plugin had increased, to support the larger feature set.

Some functions were long, and they were doing complicated state changes, so even testing every possible combination (stop clicked mid-stream, max rounds exhausted, speech to text transcription fallback mode, document mutation after a tool, etc.) was almost impossible without spinning up a LibreOffice instance and creating intricate tests. Each unit test was only sampling the state space so I realized that if I didn’t break things up into smaller functions, the test code would be larger and more complicated than the extension itself.

I didn’t have any boss demanding new features, so I spent some time researching modern tools for formal verification in Python:

Extension nameWhat it does
Type checking tools (Pyright, Mypy, Pyre, Pytype)Analyze your code to find code calling functions which don’t exist on that object, catching other syntax bugs early.
DealA lightweight library that lets you write simple rules (contracts) like “this function must receive a positive number”. It checks these rules while the program runs, helping catch errors in plain language.
CrossHairUses a mathematical engine (Z3) to explore many possible ways your code could run, automatically generating test cases and proving that your contracts hold, or pointing out where they could fail.
PyExZ3Provides a bridge to the Z3 solver so you can write custom checks that reason about your code’s logic, letting you verify complex conditions beyond simple type checks.

I decided the first step was to break up the complicated loops into pure state machines. Here is the tool-loop FSM:

The state machine loops have no threads, no mutable instance variables, no internal side effects. Just data in → new state + list of effects out. The code still does all the same UNO calls as it did before, but by breaking it up into pure state machines and smaller functions, it’s much easier to reason about and test. It’s good I made each the loops simple and reliable, because if you combine all of them, it becomes complicated:

The unit tests in test_tool_loop_state.py are now simple, deterministic, and run outside LibreOffice, and this refactor is the foundation for formal verification. The sidebar behaves the same, but under the hood the scariest parts of WriterAgent are now cleaner and more predictable.

I also now have quite a bit of test coverage, 700 tests, so I can generally make changes and ship updates without as much worrying that some simple bug doesn’t bite someone, somewhere. I didn’t set out with a goal of having so many tests, but at one point I added a rule that told the AIs to create tests for every feature and bug fix, and it kept accumulating.

Type Checking

def unpack(t: Union[FsmTransition[StateT], Tuple[StateT, List[Any]]],) → FsmTransition[StateT]:
    """Normalize legacy ``(state, effects)`` tuples to :class:`FsmTransition`."""
    if isinstance(t, FsmTransition):
        return t
    state, effects = t
    return FsmTransition(state=state, effects=list(effects))

I’m generally not a big fan of type checking in Python. It can sometimes require a lot more effort on the keyboard, and make function declarations as ugly as C++.

However, it’s basically necessary for formal verification since if a system doesn’t know that a function requires integers, it will waste a lot of time trying strings and the other types to verify it works reliably, for cases that will never happen.

I also ran into a bug where code in an unusual case was calling a method that didn’t even exist on the object. This is exactly the kind of problem that type checking was made for. Python lets you write code, and only at runtime will it flag these errors.

In many cases for small projects, which are most of them, type checking isn’t necessary. These bugs can be caught when you actually use the code. Calling the wrong method name is usually an easy fix. However, when a codebase gets above 10,000 lines of code, it starts to have more special cases, so type checking becomes worthwhile.

I researched the most popular type checkers, and it seems like the cool kids are using Ty. It’s new, modern, and written in Rust so very fast. I think Rust is a byzantine language that makes C++ look easy to read so I wouldn’t touch it in my code, but I’d be happy to read the error messages it dumps to the screen.

Ty initially found 1000 errors, but when I figured out how to trim out the contributed code (presumed to be stable) and the test code, it was just 400. That was still a lot of problems, but I just started plugging away at it.

The biggest issue was that my dev environment didn’t have type definitions for UNO. So I figured out how to load them into my local environment. It also needed protocol classes are an interesting feature because in Python, they let me say: “this function doesn’t care what type you give it, as long as it supports these methods.” You specify in the Protocol what methods you require and the type checker will verify that only those ones are called.

I was happy to get it working with Ty, but then I thought, why not just try it out with mypy, which is considered the trusted OG of type checkers? It found a few more areas. For example, it is more strict about calling methods on potentially None variables:

# Before (ty accepts, mypy rejects)def get_page_count(self):    page = self.get_active_page()    return page.getCount()  # mypy: Item "None" has no attribute "getCount"
# After (both accept)def get_page_count(self):    page = self.get_active_page()    if page is None:        return 0
    return page.getCount()

After fixing those few new problem areas, installed Pyright, It found a few more issues, and I fixed those too. So now, the make build runs the fast Ty checker, and make test / make release runs all 3.

Specialized Tools

Writer has a ton of features and UNO surface area: tables, styles, text-boxes, shapes, charts, indexes, fields, embedded objects, track changes, etc. Dumping every tool into the main chat prompt would bloat context, and even frontier models like Claude Opus would fail to make good decisions.

It’s easy to build a plugin that supports a small subset of the LibreOffice API, but having a plugin which can understand the full fidelity of LibreOffice is more difficult, but that was what I wanted to build. In fact, I had stopped adding richer Writer support to the codebase because the current API was already too large for smaller tools, and so I didn’t want to keep making the problem worse.

One way to solve the tool proliferation problem is through Fat API design instead of fine-grained (skinny) APIs which are specific tools for each operation: create_footnote, edit_footnote, delete_footnote, etc.

That code provides simpler parameter schemas per tool, is easier to map directly to underlying UNO, and simpler validation logic.However, this would cause the tool count to explode.

So one possibility is to create APIs that combine related operations into broader, multi-purpose “fat” tools. Examples: manage_footnotes(action = ‘create’, ‘edit, ‘delete, …)

  • Pros: Drastically reduces the total number of tools, limiting context size. A polymorphic schema allows more capabilities to remain in the main chat prompt, potentially eliminating the need for the sub-agent delegation pattern.
  • Cons: The parameter schemas become extremely large and complex (e.g., union types or nested generic objects). LibreOffice operations are highly disparate, making a unified underlying Python handler harder to write, and smaller LLMs often struggle to reliably handle the union parameters correctly.

Ultra-Fat API (Single manage_shapes Tool):

{
  "name": "manage_shapes",
  "parameters": {
    "action": {"type": "string", "enum": ["create", "edit", "delete"]},
    "shape_index": {"type": "integer", "description": "Target shape (for edit/delete)"},
    "shape_type": {"type": "string", "enum": ["rectangle", "ellipse", "text", "line"], "description": "Required for create"},
    "geometry": {
      "type": "object", 
      "properties": {"x": {"type": "integer"}, "y": {"type": "integer"}, "width": {"type": "integer"}, "height": {"type": "integer"}}
    },
    ...
  }
}

I decided to stick with the simple APIs for now, and create a two-level toolset, leveraging what I did for the web research subagent, which defines its own set of specialized tools (web_search, visit_webpage).

The LLM now sees a basic set of tools. For Writer they are:

FunctionPurposeKey Parameters

apply_document_contentInsert or overwrite content in the document.content (list of HTML strings), target (beginning, end, selection, full_document, search), old_content (text to find when target=’search’), all_matches (bool).
get_document_contentRetrieve the current document (or a selection/range).scope (full, selection, range), max_chars, start, end.
get_document_statsGet high‑level statistics (characters, words, paragraphs, pages, headings).No parameters
get_document_treeReturn the heading outline (or full tree) of the document.content_strategy (heading_only, first_lines, ai_summary_first, full), depth.
search_in_documentSearch for a string or regex inside the document.pattern, regex, case_sensitive, max_results, context_paragraphs, return_offsets.
add_commentWhen the user asks to “review” or “give feedback” on a documentanchor text, string
styles_applyApply a paragraph style to a target location.style_name, target (beginning, end, selection, full_document, search), old_content.
delegate_to_specialized_writer_toolsetHand off a complex Writer task to a sub‑agent that has a focused toolset (tables, charts, shapes, images, web research, etc.).domain (styles, page, embedded, shapes, charts, indexes, fields, bookmarks, tracking, images), task (free‑form description).

The main chat sees a compact core plus one gateway tool. When the model calls the gateway with a domain and task, it switches into a focused agent mode that only exposes specialized tools. When the agent is done, it calls a specialized_workflow_finished tool-call to return control to the main agent with the general toolset.

I was happy to discover this solution, because it allows over time full fidelity with LibreOffice that should work well with smaller, dumber local models.

Localization Support

My first active user was a friendly and helpful German named Samuel. He could speak English, but I could tell his native language was much better, and so I thought, why not translate this little plugin into German and some of the other popular languages? I already had code to talk to LLM endpoints, many of them speak dozens of languages. I just needed to hand them strings and ask.

The code itself didn’t have any localization support yet so I had to work on that first. The most time-consuming part was going through every string in the codebase, and deciding if it was user visible, and if so, swap the translated string instead, based on the user’s language.


In Python, the convention is create a little function called “_”:

def _(message: str) -> str:
    """Translate English msgid *message* via gettext. Must be :class:`str`."""
    if not isinstance(message, str):
        raise TypeError("gettext msgid must be str")

    global _translation
    if _translation is None:
        init_i18n()

    assert _translation is not None
    return _translation.gettext(message)

Everywhere in the code where you might display a string such as “Transcribing audio…”, you simply insert an underscore and parentheses, like this: _(“Transcribing audio...”) to auto-translate the string.

Python has a tool, xgettext, to take all the strings that are called to be translated and puts them into a central text (POT) file. Once I had it mostly working, I setup an automate process that spins up multiple threads to process strings in batches. It currently supports Spanish, French, Portuguese, Russian, German, Japanese, Italian and Polish, which covers about 3 billion people, and it’s simple to add more.

Where We Stand Now

The codebase is more reliable, the state machines are verifiable, localization is automatic, and the main chat agent stays fast and focused while delegating to specialized agents. Over time it allows to expose the full LibreOffice power.

None of this would have been possible without the incredible FOSS ecosystem: deal, CrossHair, smolagents, polib, Hermes-Agent, and other FOSS codebases, and of course the LibreOffice UNO bridge that I treat as sacred and bug-free for purposes of plugin verification. The repo is here: https://github.com/KeithCu/writeragent. Please try it out and give patches or stars ⭐.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *