Why copying and pasting from PDF or Word can look different in XWiki?

Last modified by Marius Dumitru Florea on 2020/09/08

The format (block orĀ inline elements) cannot be completely the same because the copied text needs to integrate well with the wiki skin. The "semantic format" should be kept (e.g. heading, paragraph, list, bold, italic, etc).

When you copy from a tool like a PDF viewer or a Word editor, that tool is responsible for putting the copied content on the clipboard, in multiple formats (plain text, HTML, URL, binary, etc.). When you paste in the WYSIWYG editor the browser uses the most appropriate format found in the clipboard (e.g. it looks for HTML, falling back on plain text or URL), but the actual content is the one that has been put there by the tool used to copy the text. As a consequence most of the limitations are coming from this tool. If it doesn't include some styles then the browser and the WYSIWYG editor can't do anything about it. Such tools usually provide a feature to export the content as HTML. You can use this feature to see how close to the original the HTML version of the content can be.

Besides that, the WYSIWYG editor does its own filtering in order to ensure:

  • that the HTML can be converted to wiki syntax, because the content is saved in the end as wiki syntax. If the wiki syntax cannot express some styles (e.g. the XWiki 2.1 syntax doesn't support in-line styles for list items) then those styles will be dropped.
  • that the pasted content integrates well with the the rest of the content and the Look & Feel (skin, color theme) of the wiki.

Get Connected