Spring, Red-Pill, PDF, LLM

Spring is the loveliest time of the year. Appearances can be deceiving; the PDF format might never truly disappear, and “The Matrix” and “Idiocracy” are actually the same movie franchise.
As Team Senticore continues to delve into capturing and reconstructing assembly and inspection data stored within technical publications, here is our evolving view on the future of PDFs, an update on our actual progress, and some inevitable metaphors.
The Matrix allowed sentient machines to generate energy from sleeping humans. These humans occasionally revolted, forcing the machines into a scheduled bloodbath, supported from within by a superhero Manchurian candidate. This inherently unstable construct nearly led to the extinction of both machines and humans. The Architect, true to his promise to the Oracle, simply reduced the population’s IQ to around 70-80, rendering them content and docile. Next, we see a new, mediocre Neo emerging from the basement in the middle of “Idiocracy” to perform some peaceful maintenance.
It would require a massive red-pilling exercise to wake this entire thing up.
Meanwhile, as our dimension dangerously oscillates between nightmarish and moronic versions of the universe, there is an ongoing debate within the engineering and manufacturing community among proponents of a single larger platform, a federated system of systems, and even those still operating with Excel. Yet, they may all be making a significant mistake by assuming perpetual access to the relevant software and competent personnel, which may not be the case in either “The Matrix” or “Idiocracy.” The problem is that in the event of a breakdown in either the digital thread or its human link, a company can only be “red-pilled” if it: a) has access to the right information, and b) has the right tools to consume it. Here, a long-term archiving strategy based on AP242 (STEP and JT) and PDF, coupled with AI-driven PDF reconstruction, might be the only viable solution.
While AP242 provides access to both CAD data and EBOM, the PDF format is unmatched when dealing with assembly and inspection manuals that can be generated from multiple disparate data sources. Nothing can compete with it in terms of stability, simplicity, and total cost of ownership. However, the inexpensive reconstruction of that data from PDFs into a machine-readable format for downstream consumption by MES and beyond remains a challenge. This is precisely the red pill that Senticore is developing as we speak.
Until about a year ago, conveniently extracting engineering and manufacturing text and graphic data from PDFs into a sufficiently machine-readable form was a wild dream. Today, we consider building such a system within reach.
First of all, Senticore is improving across the entire pipeline, which includes the LLM selection process, the use of various libraries on top of LLMs, identifying relevant grounding, prompt engineering, and building the appropriate infrastructure. For instance, we discovered that the libraries producing the best output are closely linked to specific hardware specifications, such as the video card brand and version, RAM size, and storage I/O.
Second, our target customers prefer locally deployed LLMs, so we are continuously juggling different vendors, models, and versions. LLama 3.3 has become a baseline in terms of features and ease of development. We are currently at the stage where we carefully map our system to the MES/MRO processes, as this is critical for successful multimodal processing.
Finally, aside from the apocalypse that may or may not occur, we observe many companies in need of a mild red-pill treatment to optimize their MES/MRO operations. If you are currently expending blood, treasure, or Brawndo on inefficient publications’ handling, Morpheus is only a phone call away.