The PDF Highway to the Digital Thread

Loving mankind is easy when you’re independently wealthy, and embracing progress is straightforward without legacy data to contend with. On the other hand, loving your neighbor and wisely connecting your present to your past to enable your future is considerably harder.
The practice of shaping public opinion through a specific version of the past is age-old. Whether the aim is to foster cohesion and motivation or, conversely, to undermine morale and destroy resistance, similar methods are employed, including lionizing, fictionalizing, omission, or distortion. Indeed, there is a compelling reason why Joseph Stalin dubbed writers “engineers of the human souls,” why Vladimir Lenin identified cinema as the most influential modern art form, and why Hollywood has been blamed for aiding and abetting the contemporary American decline.
China expressly fights against anyone disparaging its past or culture, and it consciously promotes its own positive image, though it does occasionally point to unworthy rulers and officials as examples to avoid. As China gears up for a global showdown, it produces beautifully crafted movies that almost explicitly call for revenge against Japan and the collective West. In contrast, for every pro-American franchise Hollywood makes dozens of movies painting Western civilization as inherently rotten and hardly worth fighting for.
By the way, it’s particularly galling to hear the hand-wringing about China’s booming industrial sector versus the dismal state of American manufacturing coming from the very same people who once championed offshoring under the guise of the greater good, all while decimating their own communities with DEI and ESG agendas.
Engineering and manufacturing processes consistently strive for better outcomes, a contrast to human politics. In these domains, examining legacy data is crucial for both compliance and extracting valuable insights for future designs, alongside learning from and avoiding prior costly mistakes. Reviewing the past is easy with machine-readable formats such as databases, STEP/JT, ISO-based MS Office, or S1000D. The difficulty increases dramatically with API-generated PDFs and becomes an excruciating process with PDFs from low-quality scans suffering from discoloration, handwritten notes, and the like.
Just as many Asian economies bypassed earlier stages to fully embrace Model-Based Systems Engineering, they are also driving forward with modern Interactive Electronic Technical Manuals like S1000D for their technical publications. Meanwhile, Americans continue to grapple with decades-old data for critical equipment – electrical distribution gear, tanks, aircraft, ships, oil rigs, and more – often trapped in those cumbersome PDF scans containing vital part numbers, effectivities, and notes on custom features and configurations.
Several years ago, Senticore tackled the problem of reconstructing data from PDF scans using convolutional neural networks. The quality was reasonably good, but the speed was a major bottleneck. As generative AI took off, Senticore boarded the infographic train and pivoted to connecting the PDF technical pubs universe to the digital thread. Their current approach, combining locally deployed multimodal LLMs with neural networks and human-assisted AI, offers significant potential for these types of tasks.
Naturally, there are balancing acts to play. How much effort should we allocate to developing the large language model component versus more specialized neural networks? What specific human-assistance features should we provide to blue-collar shop floor personnel to validate and refine AI-driven output? How can we accelerate access to downstream use cases such as impact analysis?
While you cannot speed up the process of giving birth to a baby by involving multiple mothers, you can definitely harness greater compute power to accelerate large language models related research and development. For instance, a rig with two NVIDIA RTX 5090 graphics cards delivers six to ten times better performance on PDF reconstruction tasks compared to our already decent hardware baseline. Incidentally, based that experience, the American restrictions regarding China access to the AI-capable advanced chips start making sense with regard to maintaining the momentum in the grand AI race.
Sword predates the pen, yet very often pen is mightier than a sword. Projecting cultural confidence and operating a comprehensive digital thread that connects to manufacturing, maintenance and inspection pubs are major strategic advantages in the upcoming clash of clans.
The future of technical documentation is arguably interactive, and it is easy to say that everyone must move in that direction as fast as possible, yet for many companies this may be as thorny as loving one’s neighbor. The Senticore‘s AI-based PDF reconstruction technology allows extracting precious pieces of engineering and manufacturing information quickly, reliably, and at reasonable cost. It provides an entrance ramp for both “good” and “bad” PDF tech pubs towards that future highway; organizations can avoid big-banging their existing processes, and transition to interactive manuals at their own pace.
With all the fears about the AI, it holds an enormous potential for igniting American industrial revival. If your organization wants to bring the mountains of PDFs into the digital thread, and you are unsure where to begin, reach out to us. With our today’s technology this is Mission Possible.