Read / Write Metadata - Excel POI Example

We have seen so much on working with actual data in POI so far. In this post, I would like to touch a bit on the metadata. A document is never complete without adding suitable metadata and Apache POI offers a plethora of options to set metadata against a document. We will discuss how to read and write metadata into Excel workbooks using Apache POI in Java with simple working examples in this post. We will focus on XLSX document types. For Office 97 – 2003 formats, we will provide another post that would explain how metadata can be extracted / written.

Metadata in Excel spreadsheets can be classified into three major categories;

Core Properties:  These are defined in org.apache.poi.POIXMLProperties.CoreProperties and include the following properties that you can read / set against:

  • Category – Category of the document
  • Content Status – Status of the content
  • Content Type – Type of Content
  • Document Created TimeStamp
  • Description of the Document
  • Identifiers
  • Keywords in the document
  • Last printed date
  • Modified Date
  • Revision
  • Subject of the document
  • Title of the document

Extended Properties: Extended properties are applicable across all Office files and defined in org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.CTProperties class. Some of the properties that you can set through this class in Apache POI is provided below:

  • Company
  • Manager
  • Hyperlink Base
Custom Properties: On top of core and extended properties, you can also add custom properties as name value pairs. You can add your own or choose from a list that is already provided as a part of Office. 

These include the following – Checked By, Client, Date Completed, Department, Destination, Disposition, Division, Document Number, Editor, Forward to, Group, Language, Mailstop, Matter, Office, Owner, Project, Publisher, Purpose, Received From, Recorded By, Reference, Source, Status, Telephone Number and Typist. You can set a type against such custom properties which can be Text, Date, Number or Yes /No. Finally you can also provide a value for such properties.

With this bit of theory, let us now write a simple Java program that will help us to set all these properties against an Excel Spreadsheet.

Excel File: Write Metadata Using Apache POI – Java Example Program


 import java.io.FileOutputStream;
 import java.io.IOException;
 import org.apache.poi.xssf.usermodel.XSSFWorkbook;
 import org.apache.poi.xssf.usermodel.*;
 import org.apache.poi.POIXMLProperties;
 import org.apache.poi.POIXMLProperties.*;
 import org.openxmlformats.schemas.officeDocument.x2006.extendedProperties.CTProperties;
 public class setMetadata
 {
   public static void main(String[] args)
     throws IOException
   {
     /* Start with a workbook object to write comments into */
     XSSFWorkbook xlsxSetMetadata = new XSSFWorkbook();
     xlsxSetMetadata.createSheet("Write Metadata");
     POIXMLProperties props = xlsxSetMetadata.getProperties();
     
     /* Let us set some core properties now*/
     POIXMLProperties.CoreProperties coreProp=props.getCoreProperties();
     coreProp.setCreator("Thinktibits"); //set document creator
     coreProp.setDescription("set Metadata using Apache POI / Java"); //set Description
     coreProp.setKeywords("Apache POI, Metadata, Java, Example Program, XLSX "); //set keywords
     coreProp.setTitle("XLSX Metadata Example Apache POI "); //Title of the document
     coreProp.setSubjectProperty("Code"); //Subject
     coreProp.setCategory("Programming"); //category
     
     /* We can now set some extended properties */
     POIXMLProperties.ExtendedProperties extProp=props.getExtendedProperties();
     extProp.getUnderlyingProperties().setCompany("Apache Software Foundation");
     extProp.getUnderlyingProperties().setTemplate("XSSF");
     extProp.getUnderlyingProperties().setManager("Manager Name");
     
     /* Finally, we can set some custom Properies */
     POIXMLProperties.CustomProperties custProp = props.getCustomProperties();
     custProp.addProperty("Author", "Thinktibits");// String
     custProp.addProperty("Year", 2014);     // Number Property
     custProp.addProperty("Published", true); //Yes No Property
     custProp.addProperty("Typist", "tika");
     
     /* Write all these metadata into workbook and Close Document */
     
     String fname = "Metadata.xlsx";
     FileOutputStream out = new FileOutputStream(fname);
     xlsxSetMetadata.write(out);
     out.close();
   }
 }

The output of this program is shown below (2 screens : One for Core/ Extended Metadata, Another for Custom Metadata)

Apache POI - Add Custom Metadata in Excel - Example Output
Apache POI - Add Custom Metadata in Excel - Example Output

Java: Add Core / Extended Metadata to Excel Using POI - Output
Java: Add Core / Extended Metadata to Excel Using POI - Output
Like how you can write metadata information into Excel Files, Apache POI also provides rich set of methods to read them and use them for various purposes – there are different getter methods available to read different types of properties. As an example, let us read the same document that we created in the step earlier, and print the metadata information out in the console. Here is a program that does that with the output;

Excel File: Read Metadata with Apache POI – Example Code


 import java.io.FileInputStream;
 import java.io.*;
 import java.util.List;
 import org.apache.poi.xssf.usermodel.XSSFWorkbook;
 import org.apache.poi.xssf.usermodel.*;
 import org.apache.poi.POIXMLProperties;
 import org.apache.poi.POIXMLProperties.*;
 import org.openxmlformats.schemas.officeDocument.x2006.customProperties.*;
 public class getMetadata
 {
   public static void main(String[] args)
     throws IOException
   {
     FileInputStream input_document = new FileInputStream(new File("Metadata.xlsx")); //Read XLSX document - Office 2007, 2010 format     
     XSSFWorkbook readMetadata = new XSSFWorkbook(input_document); //Read the Excel Workbook in a instance object    

     POIXMLProperties props = readMetadata.getProperties();
     
     
     POIXMLProperties.CoreProperties coreProp=props.getCoreProperties();
     /* Read and print core properties as SOP */
     System.out.println("Document Creator :" + coreProp.getCreator()); 
     System.out.println("Description :" + coreProp.getDescription());
     System.out.println("Keywords :" + coreProp.getKeywords());
     System.out.println("Title :" + coreProp.getTitle());
     System.out.println("Subject :" + coreProp.getSubject());
     System.out.println("Category :" + coreProp.getCategory());
     
     
     /* Read and print extended properties */
     POIXMLProperties.ExtendedProperties extProp=props.getExtendedProperties();
     System.out.println("Company :" + extProp.getUnderlyingProperties().getCompany());
     System.out.println("Template :" + extProp.getUnderlyingProperties().getTemplate());
     System.out.println("Manager Name :" + extProp.getUnderlyingProperties().getManager());
     
     
     /* Finally, we can retrieve some custom Properies */
     POIXMLProperties.CustomProperties custProp = props.getCustomProperties();
     List<CTProperty> my1=custProp.getUnderlyingProperties().getPropertyList();
     System.out.println("Size :" + my1.size());
     for (int i = 0; i < my1.size(); i++) {
            CTProperty pItem = my1.get(i);
            System.out.println(""+pItem.getPid());
            System.out.println(""+pItem.getFmtid());
            System.out.println(""+pItem.getName());
            System.out.println(""+pItem.getLpwstr());
            
     }
 
   }
 }

This programs prints the metadata information and the output is shown below:

Document Creator :Thinktibits
Description :set Metadata using Apache POI / Java
Keywords :Apache POI, Metadata, Java, Example Program, XLSX
Title :XLSX Metadata Example Apache POI
Subject :Code
Category :Programming
Company :Apache Software Foundation
Template :XSSF
Manager Name :Manager Name
Size :4
2
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Author
Thinktibits
3
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Year
null
4
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Published
null
5
{D5CDD505-2E9C-101B-9397-08002B2CF9AE}
Typist
tika

If you notice the custom properties section output, you are able to get the output for properties that are of type “string”. You need to use different methods to get if you use a Yes/No or number type of custom property. I would like leave this for you to give a try – post a comment here if you are stuck. The examples presented in this post are tested with POIv3.8. You can modify the code provided above to update metadata on existing Excel documents, if you have such a requirement.

1 comment:

  1. Can custom properties on worksheet level (in VBA accessed as Worksheet.CustomProperties) also be accessed via POI?

    ReplyDelete