IBM Leverages AI to Transform COBOL Code into Java

COBOL, which stands for Common Business Oriented Language, is one of the oldest programming languages, dating back to 1959. Remarkably, it remains widely used; a 2022 survey revealed that over 800 billion lines of COBOL code are active in production systems, a significant increase from the estimated 220 billion in 2017.

Despite its longevity, COBOL often faces criticism as a challenging and inefficient language. For large organizations, migrating to a newer language can be a daunting and expensive endeavor, primarily due to the scarcity of COBOL experts. For instance, the Commonwealth Bank of Australia spent over $700 million and took five years to replace its core COBOL platform in 2012.

To address the modernization of COBOL applications, IBM has introduced Code Assistant for IBM Z, a groundbreaking solution that employs an AI model to translate COBOL code into Java. Scheduled for general availability in Q4 2023, Code Assistant for IBM Z will launch in preview at IBM’s TechXchange conference in Las Vegas this September.

Code Assistant for IBM Z aims to support businesses in refactoring their mainframe applications while maintaining performance and security, as highlighted by IBM Research Chief Scientist Ruchir Puri. The tool operates locally in an on-premises configuration or as a managed cloud service and is anchored by a sophisticated code-generating model, CodeNet, which can understand COBOL, Java, and approximately 80 other programming languages.

“IBM developed a state-of-the-art generative AI code model to convert legacy COBOL programs into enterprise-grade Java with high fidelity,” Puri shared in an email interview. “Beyond code conversion, Code Assistant encompasses the entire application modernization lifecycle, enabling developers to understand, refactor, transform, and validate the transitioned code within a contemporary architecture.”

Puri explains that CodeNet was trained on 1.5 trillion tokens and features 20 billion parameters, designed with a significant context window of 32,000 tokens to better understand the broader context, thereby facilitating a more efficient COBOL-to-Java conversion. In this context, parameters represent the model's learned elements from historical training data, while tokens are the raw text components. The context window defines the text the model considers before generating new content.

Currently, various tools and services can convert COBOL applications to Java syntax, some fully automated. While Puri acknowledges this, he asserts that Code Assistant is uniquely positioned to preserve COBOL’s inherent capabilities while optimizing costs and producing maintainable code, setting itself apart from competing solutions.

“IBM designed Code Assistant for IBM Z to seamlessly integrate COBOL and Java services,” Puri explained. “If the system's understanding and refactoring capabilities determine that a specific application sub-service should remain in COBOL, it will do so while transforming other sub-services into Java.”

However, it's important to note that Code Assistant is not infallible. A recent study from Stanford indicates that software engineers utilizing similar AI code-generating systems are more likely to introduce vulnerabilities in their applications. Puri emphasizes the necessity of human review before deploying any code generated by Code Assistant.

“Like all AI systems, Code Assistant for IBM Z may not yet be equipped to handle unique usage patterns within an enterprise’s COBOL applications,” Puri advised. “It’s critical to scan the code with advanced vulnerability scanners to ensure security.”

Despite these risks, IBM recognizes tools like Code Assistant as vital to its growth. About 84% of IBM’s mainframe clients still operate COBOL, predominantly in the financial and government sectors. While the mainframe division remains a significant segment of IBM's business, the company envisions it as a pathway to broader, lucrative hybrid computing environments.

IBM is also pursuing advancements in generative AI tools, positioning itself to compete with platforms like GitHub Copilot and Amazon CodeWhisperer. In May, IBM launched fm.model.code within its Watsonx AI service, powering Watson Code Assistant, which allows developers to generate code from plain English prompts across various programs, including Red Hat’s Ansible Lightspeed.

Most people like

Find AI tools in YBX