Java で PDF からメタデータとメトリクスを抽出する

作成日2024年3月4日

最終更新日2024年3月4日

JPedal ライブラリを使用して、PDF ファイルに関するメタデータを抽出できます。いくつかの PdfUtilities クラスがあります。

以下のサンプルコードを使用すれば、不要な行を削除するだけで、独自のアプリケーションで PdfUtilities を使用できます。

				
					final PdfUtilities utilities = new PdfUtilities("path/to/exampleFile.pdf");
utilities.setPassword("password"); //Only required is file requires password
try {
    if (utilities.openPDFFile()) {
        //Returns true if files contains any embedded fonts
        final boolean hasEmbeddedFonts = utilities.hasEmbeddedFonts();

        //Returns a map where the key is the page number and the value is a String detailing fonts for that page
        final Map<Integer, String > documentFontData = utilities.getAllFontDataForDocument();

        //Returns a String containing all metadata fields for the document
        final String documentPropertiesAsXML = utilities.getDocumentPropertyFieldsInXML();

        //Returns a map where the key is the property name and the value is the properties value
        final Map<String, String > documentPropertiesAsMap = utilities.getDocumentPropertyStringValuesAsMap();

        //Returns a boolean to show true if the file confirms to all tagged PDF conventions. It may be possible to extract some tagged content even if false
        final boolean isFullyTagged = utilities.isMarkedContent();

        //Returns the permissions value for this PDF and shows the permissions as a string in the console
        final int permissions = utilities.getPdfFilePermissions();
        PdfUtilities.showPermissionsAsString(permissions);

        //Returns the total page count as an int
        final int totalPageCount = utilities.getPageCount();

        for (int i = 1; i != totalPageCount; i++) {
            //Get the page dimensions for the specified page in the given units and type
            final float[] pageDimensions = utilities.getPageDimensions(i, PdfUtilities.PageUnits.Pixels, PdfUtilities.PageSizeType.CropBox);

            //Returns the total number of PDF commands used to define the specified pages content
            final int commandCountForPage = utilities.getCommandCountForPageStream(i);

            //Returns the font data as a string for the specified page
            final String fontDataForPage = utilities.getFontDataForPage(i);

            //Returns the image data as a String for the specified page
            final String xImageDataForPage = utilities.getXImageDataForPage(i);

        }
    }
} catch(final PdfException e) {
    e.printStackTrace();
}
utilities.closePDFfile();

PDFUtilities クラスは、PDF をファイル名の代わりにバイト配列として受け入れる 2 番目のコンストラクターを使用してインスタンス化することもできます。このコンストラクターは次のように使用できます。

				
					byte[] pdfByteArray;

//Read PDF into pdfByteArray

final PdfUtilities utilities = new PdfUtilities(pdfByteArray);

はじめに

JPedalを実行する

Javadoc

Javaバージョンのサポート

IDEにJPedalを設定する方法

JPedalの使い方

機能

JPedal Viewer

テキスト関連

テキスト抽出のWebサービスAPI

画像への変換

画像の抽出

PDF画像変換のWebサービスAPI

フォームについて

PDFの注釈(アノテーション)

PDFの操作

印刷について

メタデータ

フォントについて

JPedalをクラウド上で実行する

オプション設定

java アプリケーションサーバーへデプロイする

Docker を使用して独自のクラウド API をホストする

アップデート情報

Java で PDF からメタデータとメトリクスを抽出する