Java で PDF からメタデータとメトリクスを抽出する

作成日2024年3月4日

最終更新日2024年3月4日

JPedal ライブラリを使用して、PDF ファイルに関するメタデータを抽出できます。いくつかの PdfUtilities クラスがあります。

以下のサンプルコードを使用すれば、不要な行を削除するだけで、独自のアプリケーションで PdfUtilities を使用できます。

				
					final <a href="https://javadoc.idrsolutions.com/org/jpedal/examples/PdfUtilities.html" target="_blank" title="declaration: package: org.jpedal.examples, class: PdfUtilities">PdfUtilities</a> utilities = new PdfUtilities("<a href="https://pub.dev/packages/path" target="_blank" title="A string-based path manipulation library. All of the path operations you know and love, with solid support for Windows, POSIX (Linux and Mac OS X), and the web." hreflang="en-us">path</a>/to/exampleFile.pdf");
utilities.setPassword("password"); //Only required is file requires password
try {
    if (utilities.openPDFFile()) {
        //Returns true if files contains any embedded fonts
        final boolean hasEmbeddedFonts = utilities.hasEmbeddedFonts();

        //Returns a map where the key is the page number and the value is a String detailing fonts for that page
        final Map<integer string> documentFontData = utilities.getAllFontDataForDocument();

        //Returns a String containing all metadata fields for the <a href="https://javadoc.idrsolutions.com/org/jpedal/examples/text/ExtractOutline.html" target="_blank" title="declaration: package: org.jpedal.examples.text, class: ExtractOutline">document</a>
        final String documentPropertiesAsXML = utilities.getDocumentPropertyFieldsInXML();

        //Returns a map where the key is the property name and the value is the properties value
        final Map<string string> documentPropertiesAsMap = utilities.getDocumentPropertyStringValuesAsMap();

        //Returns a boolean to show true if the file confirms to all tagged PDF conventions. It may be possible to extract some tagged content even if false
        final boolean isFullyTagged = utilities.isMarkedContent();

        //Returns the permissions value for this PDF and shows the permissions as a string in the console
        final int permissions = utilities.getPdfFilePermissions();
        PdfUtilities.showPermissionsAsString(permissions);

        //Returns the total page count as an int
        final int totalPageCount = utilities.getPageCount();

        for (int i = 1; i != totalPageCount; i++) {
            //Get the page dimensions for the specified page in the given units and type
            final float[] pageDimensions = utilities.<a href="https://javadoc.idrsolutions.com/org/jpedal/examples/PdfUtilities.html" target="_blank" title="declaration: package: org.jpedal.examples, class: PdfUtilities">getPageDimensions</a>(i, PdfUtilities.PageUnits.Pixels, PdfUtilities.PageSizeType.CropBox);

            //Returns the total number of PDF commands used to define the specified pages content
            final int commandCountForPage = utilities.getCommandCountForPageStream(i);

            //Returns the font data as a string for the specified page
            final String fontDataForPage = utilities.getFontDataForPage(i);

            //Returns the image data as a String for the specified page
            final String xImageDataForPage = utilities.getXImageDataForPage(i);

        }
    }
} catch(final PdfException e) {
    e.printStackTrace();
}
utilities.closePDFfile();</string></integer>

PDFUtilities クラスは、PDF をファイル名の代わりにバイト配列として受け入れる 2 番目のコンストラクターを使用してインスタンス化することもできます。このコンストラクターは次のように使用できます。

				
					byte[] pdfByteArray;

//Read PDF into pdfByteArray

final PdfUtilities utilities = new PdfUtilities(pdfByteArray);

はじめに

JPedalを実行する

Javadoc

Javaバージョンのサポート

IDEにJPedalを設定する方法

JPedalの使い方

機能

JPedal Viewer

テキスト関連

テキスト抽出のWebサービスAPI

画像への変換

画像の抽出

PDF画像変換のWebサービスAPI

フォームについて

PDFの注釈(アノテーション)

PDFの操作

印刷について

メタデータ

フォントについて

JPedalをクラウド上で実行する

オプション設定

java アプリケーションサーバーへデプロイする

Docker を使用して独自のクラウド API をホストする

アップデート情報

Java で PDF からメタデータとメトリクスを抽出する