Remplacement d'un texte dans Apache POI XWPF

Question

Je viens de trouver la bibliothèque Apache POI très utile pour éditer des fichiers Word en utilisant Java. Plus précisément, je souhaite modifier un fichier DOCX à l'aide des classes XWPF d'Apache POI. Je n'ai trouvé aucune méthode/documentation appropriée après laquelle je pouvais faire cela. Quelqu'un peut-il expliquer, par étapes, comment remplacer du texte dans un fichier DOCX.

** Le texte peut être dans une ligne/paragraphe ou dans une ligne/colonne de tableau

Merci d'avance :)

** Le texte peut être dans une ligne/paragraphe ou dans une ligne/colonne de tableau

Merci d'avance :)

Gagravarr · Accepted Answer

La méthode dont vous avez besoin est XWPFRun.setText (String) . Parcourez simplement le fichier jusqu'à ce que vous trouviez le XWPFRun d'intérêt, déterminez ce que vous voulez que le nouveau texte soit et remplacez-le. (Une exécution est une séquence de texte avec la même mise en forme)

Vous devriez pouvoir faire quelque chose comme:

XWPFDocument doc = new XWPFDocument(OPCPackage.open("input.docx")); for (XWPFParagraph p : doc.getParagraphs()) { List<XWPFRun> runs = p.getRuns(); if (runs != null) { for (XWPFRun r : runs) { String text = r.getText(0); if (text != null && text.contains("needle")) { text = text.replace("needle", "haystack"); r.setText(text, 0); } } } } for (XWPFTable tbl : doc.getTables()) { for (XWPFTableRow row : tbl.getRows()) { for (XWPFTableCell cell : row.getTableCells()) { for (XWPFParagraph p : cell.getParagraphs()) { for (XWPFRun r : p.getRuns()) { String text = r.getText(0); if (text != null && text.contains("needle")) { text = text.replace("needle", "haystack"); r.setText(text,0); } } } } } } doc.write(new FileOutputStream("output.docx"));

Kyle Willkomm · Answer

Voici ce que nous avons fait pour le remplacement de texte avec Apache POI. Nous avons constaté qu'il ne valait pas la peine et plus simple de remplacer le texte d'un XWPFParagraph entier au lieu d'une exécution. Une exécution peut être divisée de manière aléatoire au milieu d'un Word, car Microsoft Word est responsable de l'endroit où les exécutions sont créées dans le paragraphe d'un document. Par conséquent, le texte que vous recherchez peut être à moitié dans un passage et à moitié dans un autre. L'utilisation du texte intégral d'un paragraphe, la suppression de ses exécutions existantes et l'ajout d'une nouvelle exécution avec le texte ajusté semblent résoudre le problème du remplacement de texte.

Cependant, le remplacement au niveau du paragraphe a un coût; vous perdez la mise en forme des exécutions dans ce paragraphe. Par exemple, si au milieu de votre paragraphe vous aviez mis en gras le mot "bits", puis lors de l'analyse du fichier, vous avez remplacé le mot "bits" par "octets", le mot "octets" ne serait plus mis en gras. Parce que les caractères gras ont été stockés avec une séquence qui a été supprimée lorsque le corps du texte entier du paragraphe a été remplacé. Le code joint a une section commentée qui fonctionnait pour le remplacement du texte au niveau de l'exécution si vous en avez besoin.

Il convient également de noter que ce qui suit fonctionne si le texte que vous insérez contient des caractères de retour. Nous n'avons pas pu trouver un moyen d'insérer des retours sans créer une exécution pour chaque section avant le retour et marquer l'exécution addCarriageReturn (). À votre santé

 package com.healthpartners.hcss.client.external.Word.replacement; import Java.util.List; import org.Apache.commons.lang.StringUtils; import org.Apache.poi.xwpf.usermodel.XWPFDocument; import org.Apache.poi.xwpf.usermodel.XWPFParagraph; import org.Apache.poi.xwpf.usermodel.XWPFRun; public class TextReplacer { private String searchValue; private String replacement; public TextReplacer(String searchValue, String replacement) { this.searchValue = searchValue; this.replacement = replacement; } public void replace(XWPFDocument document) { List<XWPFParagraph> paragraphs = document.getParagraphs(); for (XWPFParagraph xwpfParagraph : paragraphs) { replace(xwpfParagraph); } } private void replace(XWPFParagraph paragraph) { if (hasReplaceableItem(paragraph.getText())) { String replacedText = StringUtils.replace(paragraph.getText(), searchValue, replacement); removeAllRuns(paragraph); insertReplacementRuns(paragraph, replacedText); } } private void insertReplacementRuns(XWPFParagraph paragraph, String replacedText) { String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacedText, "
"); for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) { String part = replacementTextSplitOnCarriageReturn[j]; XWPFRun newRun = paragraph.insertNewRun(j); newRun.setText(part); if (j+1 < replacementTextSplitOnCarriageReturn.length) { newRun.addCarriageReturn(); } } } private void removeAllRuns(XWPFParagraph paragraph) { int size = paragraph.getRuns().size(); for (int i = 0; i < size; i++) { paragraph.removeRun(0); } } private boolean hasReplaceableItem(String runText) { return StringUtils.contains(runText, searchValue); } //REVISIT The below can be removed if Michele tests and approved the above less versatile replacement version // private void replace(XWPFParagraph paragraph) { // for (int i = 0; i < paragraph.getRuns().size() ; i++) { // i = replace(paragraph, i); // } // } // private int replace(XWPFParagraph paragraph, int i) { // XWPFRun run = paragraph.getRuns().get(i); // // String runText = run.getText(0); // // if (hasReplaceableItem(runText)) { // return replace(paragraph, i, run); // } // // return i; // } // private int replace(XWPFParagraph paragraph, int i, XWPFRun run) { // String runText = run.getCTR().getTArray(0).getStringValue(); // // String beforeSuperLong = StringUtils.substring(runText, 0, runText.indexOf(searchValue)); // // String[] replacementTextSplitOnCarriageReturn = StringUtils.split(replacement, "
"); // // String afterSuperLong = StringUtils.substring(runText, runText.indexOf(searchValue) + searchValue.length()); // // Counter counter = new Counter(i); // // insertNewRun(paragraph, run, counter, beforeSuperLong); // // for (int j = 0; j < replacementTextSplitOnCarriageReturn.length; j++) { // String part = replacementTextSplitOnCarriageReturn[j]; // // XWPFRun newRun = insertNewRun(paragraph, run, counter, part); // // if (j+1 < replacementTextSplitOnCarriageReturn.length) { // newRun.addCarriageReturn(); // } // } // // insertNewRun(paragraph, run, counter, afterSuperLong); // // paragraph.removeRun(counter.getCount()); // // return counter.getCount(); // } // private class Counter { // private int i; // // public Counter(int i) { // this.i = i; // } // // public void increment() { // i++; // } // // public int getCount() { // return i; // } // } // private XWPFRun insertNewRun(XWPFParagraph xwpfParagraph, XWPFRun run, Counter counter, String newText) { // XWPFRun newRun = xwpfParagraph.insertNewRun(counter.i); // newRun.getCTR().set(run.getCTR()); // newRun.getCTR().getTArray(0).setStringValue(newText); // // counter.increment(); // // return newRun; // }

ron · Answer

ma tâche consistait à remplacer les textes au format $ {key} par les valeurs d'une carte dans un document Word docx. Les solutions ci-dessus étaient un bon point de départ mais ne prenaient pas en compte tous les cas: $ {key} peut être réparti non seulement sur plusieurs exécutions mais également sur plusieurs textes au sein d'une exécution. Je me suis donc retrouvé avec le code suivant:

 private void replace(String inFile, Map<String, String> data, OutputStream out) throws Exception, IOException { XWPFDocument doc = new XWPFDocument(OPCPackage.open(inFile)); for (XWPFParagraph p : doc.getParagraphs()) { replace2(p, data); } for (XWPFTable tbl : doc.getTables()) { for (XWPFTableRow row : tbl.getRows()) { for (XWPFTableCell cell : row.getTableCells()) { for (XWPFParagraph p : cell.getParagraphs()) { replace2(p, data); } } } } doc.write(out); } private void replace2(XWPFParagraph p, Map<String, String> data) { String pText = p.getText(); // complete paragraph as string if (pText.contains("${")) { // if paragraph does not include our pattern, ignore TreeMap<Integer, XWPFRun> posRuns = getPosToRuns(p); Pattern pat = Pattern.compile("\$\{(.+?)\}"); Matcher m = pat.matcher(pText); while (m.find()) { // for all patterns in the paragraph String g = m.group(1); // extract key start and end pos int s = m.start(1); int e = m.end(1); String key = g; String x = data.get(key); if (x == null) x = ""; SortedMap<Integer, XWPFRun> range = posRuns.subMap(s - 2, true, e + 1, true); // get runs which contain the pattern boolean found1 = false; // found $ boolean found2 = false; // found { boolean found3 = false; // found } XWPFRun prevRun = null; // previous run handled in the loop XWPFRun found2Run = null; // run in which { was found int found2Pos = -1; // pos of { within above run for (XWPFRun r : range.values()) { if (r == prevRun) continue; // this run has already been handled if (found3) break; // done working on current key pattern prevRun = r; for (int k = 0;; k++) { // iterate over texts of run r if (found3) break; String txt = null; try { txt = r.getText(k); // note: should return null, but throws exception if the text does not exist } catch (Exception ex) { } if (txt == null) break; // no more texts in the run, exit loop if (txt.contains("$") && !found1) { // found $, replace it with value from data map txt = txt.replaceFirst("\$", x); found1 = true; } if (txt.contains("{") && !found2 && found1) { found2Run = r; // found { replace it with empty string and remember location found2Pos = txt.indexOf('{'); txt = txt.replaceFirst("\{", ""); found2 = true; } if (found1 && found2 && !found3) { // find } and set all chars between { and } to blank if (txt.contains("}")) { if (r == found2Run) { // complete pattern was within a single run txt = txt.substring(0, found2Pos)+txt.substring(txt.indexOf('}')); } else // pattern spread across multiple runs txt = txt.substring(txt.indexOf('}')); } else if (r == found2Run) // same run as { but no }, remove all text starting at { txt = txt.substring(0, found2Pos); else txt = ""; // run between { and }, set text to blank } if (txt.contains("}") && !found3) { txt = txt.replaceFirst("\}", ""); found3 = true; } r.setText(txt, k); } } } System.out.println(p.getText()); } } private TreeMap<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) { int pos = 0; TreeMap<Integer, XWPFRun> map = new TreeMap<Integer, XWPFRun>(); for (XWPFRun run : paragraph.getRuns()) { String runText = run.text(); if (runText != null && runText.length() > 0) { for (int i = 0; i < runText.length(); i++) { map.put(pos + i, run); } pos += runText.length(); } } return map; }

Thierry Bodhuin · Answer

Si quelqu'un doit également conserver la mise en forme du texte, ce code fonctionne mieux.

private static Map<Integer, XWPFRun> getPosToRuns(XWPFParagraph paragraph) { int pos = 0; Map<Integer, XWPFRun> map = new HashMap<Integer, XWPFRun>(10); for (XWPFRun run : paragraph.getRuns()) { String runText = run.text(); if (runText != null) { for (int i = 0; i < runText.length(); i++) { map.put(pos + i, run); } pos += runText.length(); } } return (map); } public static <V> void replace(XWPFDocument document, Map<String, V> map) { List<XWPFParagraph> paragraphs = document.getParagraphs(); for (XWPFParagraph paragraph : paragraphs) { replace(paragraph, map); } } public static <V> void replace(XWPFDocument document, String searchText, V replacement) { List<XWPFParagraph> paragraphs = document.getParagraphs(); for (XWPFParagraph paragraph : paragraphs) { replace(paragraph, searchText, replacement); } } private static <V> void replace(XWPFParagraph paragraph, Map<String, V> map) { for (Map.Entry<String, V> entry : map.entrySet()) { replace(paragraph, entry.getKey(), entry.getValue()); } } public static <V> void replace(XWPFParagraph paragraph, String searchText, V replacement) { boolean found = true; while (found) { found = false; int pos = paragraph.getText().indexOf(searchText); if (pos >= 0) { found = true; Map<Integer, XWPFRun> posToRuns = getPosToRuns(paragraph); XWPFRun run = posToRuns.get(pos); XWPFRun lastRun = posToRuns.get(pos + searchText.length() - 1); int runNum = paragraph.getRuns().indexOf(run); int lastRunNum = paragraph.getRuns().indexOf(lastRun); String texts[] = replacement.toString().split("
"); run.setText(texts[0], 0); XWPFRun newRun = run; for (int i = 1; i < texts.length; i++) { newRun.addCarriageReturn(); newRun = paragraph.insertNewRun(runNum + i); /* We should copy all style attributes to the newRun from run also from background color, ... Here we duplicate only the simple attributes... */ newRun.setText(texts[i]); newRun.setBold(run.isBold()); newRun.setCapitalized(run.isCapitalized()); // newRun.setCharacterSpacing(run.getCharacterSpacing()); newRun.setColor(run.getColor()); newRun.setDoubleStrikethrough(run.isDoubleStrikeThrough()); newRun.setEmbossed(run.isEmbossed()); newRun.setFontFamily(run.getFontFamily()); newRun.setFontSize(run.getFontSize()); newRun.setImprinted(run.isImprinted()); newRun.setItalic(run.isItalic()); newRun.setKerning(run.getKerning()); newRun.setShadow(run.isShadowed()); newRun.setSmallCaps(run.isSmallCaps()); newRun.setStrikeThrough(run.isStrikeThrough()); newRun.setSubscript(run.getSubscript()); newRun.setUnderline(run.getUnderline()); } for (int i = lastRunNum + texts.length - 1; i > runNum + texts.length - 1; i--) { paragraph.removeRun(i); } } } }

birya · Answer

Le premier morceau de code me donne une NullPointerException, quelqu'un sait ce qui ne va pas?

run.getText (int position) - à partir de la documentation: Renvoie: le texte de ce texte est exécuté ou nul s'il n'est pas défini

Vérifiez simplement qu'il n'est pas nul avant d'appeler contient () dessus

Et btw si vous souhaitez remplacer le texte, vous devez le placer dans la position à partir de laquelle vous l'obtenez, dans ce cas r.setText (text, 0) ;. Sinon, le texte sera ajouté non remplacé

Sherin · Answer

La réponse acceptée ici a besoin d'une mise à jour supplémentaire avec la mise à jour de Justin Skiles. r.setText (texte, 0); Raison: si vous ne mettez pas à jour setText avec la variable pos, la sortie sera la combinaison de l'ancienne chaîne et de la chaîne de remplacement.

Dmitry Stolbov · Answer

Il existe l'implémentation replaceParagraph qui remplace ${key} avec value (le paramètre fieldsForReport) et enregistre le format en fusionnant runs contenu ${key}.

private void replaceParagraph(XWPFParagraph paragraph, Map<String, String> fieldsForReport) throws POIXMLException { String find, text, runsText; List<XWPFRun> runs; XWPFRun run, nextRun; for (String key : fieldsForReport.keySet()) { text = paragraph.getText(); if (!text.contains("${")) return; find = "${" + key + "}"; if (!text.contains(find)) continue; runs = paragraph.getRuns(); for (int i = 0; i < runs.size(); i++) { run = runs.get(i); runsText = run.getText(0); if (runsText.contains("${") || (runsText.contains("$") && runs.get(i + 1).getText(0).substring(0, 1).equals("{"))) { while (!runsText.contains("}")) { nextRun = runs.get(i + 1); runsText = runsText + nextRun.getText(0); paragraph.removeRun(i + 1); } run.setText(runsText.contains(find) ? runsText.replace(find, fieldsForReport.get(key)) : runsText, 0); } } } }

Implémentation replaceParagraph

Test unitaire

Deividas Strioga · Answer

À la date de rédaction, aucune des réponses ne se substitue correctement.

La réponse Gagravars n'inclut pas les cas où les mots à remplacer sont divisés en séries; La solution de Thierry Boduins laissait parfois des mots à remplacer vides quand ils étaient après d'autres mots à remplacer, elle ne vérifie pas non plus les tableaux.

En utilisant la réponse Gagtavars comme base, j'ai également vérifié l'exécution avant l'exécution en cours si le texte des deux exécutions contient le mot à remplacer, en ajoutant le bloc else. Mon ajout dans kotlin:

if (text != null) { if (text.contains(findText)) { text = text.replace(findText, replaceText) r.setText(text, 0) } else if (i > 0 && p.runs[i - 1].getText(0).plus(text).contains(findText)) { val pos = p.runs[i - 1].getText(0).indexOf('$') text = textOfNotFullSecondRun(text, findText) r.setText(text, 0) val findTextLengthInFirstRun = findTextPartInFirstRun(p.runs[i - 1].getText(0), findText) val prevRunText = p.runs[i - 1].getText(0).replaceRange(pos, findTextLengthInFirstRun, replaceText) p.runs[i - 1].setText(prevRunText, 0) } } private fun textOfNotFullSecondRun(text: String, findText: String): String { return if (!text.contains(findText)) { textOfNotFullSecondRun(text, findText.drop(1)) } else { text.replace(findText, "") } } private fun findTextPartInFirstRun(text: String, findText: String): Int { return if (text.contains(findText)) { findText.length } else { findTextPartInFirstRun(text, findText.dropLast(1)) } }

c'est la liste des runs dans un paragraphe. Même chose avec le bloc de recherche dans le tableau. Avec cette solution, je n'ai eu aucun problème pour le moment. Tout le formatage est intact.

Edit: j'ai fait une bibliothèque Java pour le remplacer, vérifiez-la: https://github.com/deividasstr/docx-Word-replacer

Optio · Answer

Je suggère ma solution pour remplacer le texte entre #, par exemple: Ce # bookmark # devrait être remplacé. Il est remplacé dans:

paragraphes;
les tables;
pieds de page.

En outre, il prend en compte les situations où le symbole # et le signet sont dans les exécutions séparées ( remplacent la variable entre les différentes exécutions ).

Voici un lien vers le code: https://Gist.github.com/aerobium/bf02e443c079c5caec7568e167849dda