If fpdf2 is not shaping correctly, verify that uharfbuzz is installed and that you've explicitly called pdf.set_text_shaping(True) .
(Requirement: You must install the Tesseract Khmer language data pack ( khm.traineddata ) for this to work). 4. Summary Checklist for Success
Processing PDF Files with Python and Khmer Text: A Verified Guide
with pdfplumber.open(pdf_path) as pdf: for page in pdf.pages: text = page.extract_text() if text: khmer_segments = khmer_unicode_range.findall(text) extracted_text.extend(khmer_segments)
from pypdf import PdfReader
: Avoid raw canvas operations. Use WeasyPrint or pdfkit (wkhtmltopdf wrapper) which naturally handles HarfBuzz/Pango text shaping. 3. Scrambled Text on Extraction
ខ្ញុំឈ្មោះភីថុន។ ខ្ញុំកំពុងរៀនអានឯកសារPDF ជាភាសាខ្មែរ។