PDF handling is the process of converting a source PDF file into a translation compatible file format to obtain a reasonable similarity and text integrity to the initial source file.
It differs from Desktop Publishing (DTP) mostly on the expected effort and complexity of the layout adaptation and construction: the layout following translation will have a high similarity rate, but is obtained through simpler processes as the source document is usually simpler. A document sent for PDF is not necessarily a document thought for publication, printing, or electronic distribution, or doesn't necessarily have the same high quality standards of layout for the public.
Why should I request PDF Handling?
The need for using the PDF file format is often driven by the unavailability of editable source files or the nature of the document being scanned. Since PDFs are non-editable, they present a challenge for translation. This service can be requested for cases where:
- the only available documents is an uneditable PDF meant to be translated and shared (ex: inside an organization)
- conversion into another document format is required
- the PDF is a scanned document
- the PDF is an handwritten document
What can be done with PDF Handling?
The Unbabel team can handle PDFs efficiently, using techniques like OCR (Optical Character Recognition) to extract text from scanned documents and ensure accurate translations despite the non-editable format. The output of the conversion is an editable format (see below).
Depending on the complexity of the source and the OCR output, various adjustments on the generated source file may be required. This includes formatting simplification (e.g merging of styles/cells), aggressive tag handling (e.g. merging of Microsoft Office tags) and restructuring of content.
Different options concerning the nature of the output file are available:
- Convert back to PDF: the standard scenario is that a PDF is converted into an editable format: MS Word (DOCX), PowerPoint (PPTX), Plain text format (TXT) or Microsoft Excel format (XLSX). Please provide the details for these requirements in the service instructions.
- Fix layout before delivery: review of translated materials within Unbabel that have undergone PDF/layout work to check accuracy.
- Only check PDF Layout:
Supported file formats
PDF is the only supported format for this service.
What do I need to provide?
Apart from the source file, you should provide clear instructions (ex: concerning the desired output format), guidelines, or other relevant information.
How to request PDF Handling
PDF Handling, as other services, is request in step 4 of project creation. You can toggle the service to action further preferences. Further down, you'll see the Upload files button which you should use to upload any support files.
If all files in the project are eligible for File Engineering, you can also tick the respective checkbox to toggle the service on for all files, and propagate the instructions written in the text field.
The deliverables consist of fully translated and formatted documents, customized to meet the client's requirements. Typically, these include: a PDF export of the translated/checked content, or an editable file based on the output of the conversion, either MS Word (DOCX), PowerPoint (PPTX), Plain text format (TXT) or Microsoft Excel format (XLSX).