Search ZIP File with Regex - Java NIO Example

In this example, we will discuss how to use regular expressions to search and find entries inside a ZIP file using a Java program. This post can be considered as an extension to the earlier post on Glob pattern based search on ZIP Files using Java.We will be using the same approach as outlined in the Glob tutorial to do a regex based search. The differences alone will be discussed in this tutorial. Readers are advised to go through the Glob tutorial as it contains in depth information on how this code works.

Regex Search ZIP File Entries:

The complete Java program to do a regex based search on ZIP file entries using Java NIO / ZPFS is provided below:


import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.*;
import java.io.IOException;
import java.util.*;
import java.net.URI;

class regexSearch implements FileVisitor {
   
    private final PathMatcher matcher;
  
    public regexSearch(String searchPattern) {
       matcher = FileSystems.getDefault().getPathMatcher("regex:" + searchPattern);
    }
    
    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
    throws IOException {
            Path my_file = (Path) file;
            Path name = my_file.getFileName();
            if (name != null && matcher.matches(name)) {
                    System.out.println("Searched file was found: " + name + " in " + my_file.toRealPath().toString());
            }
            return FileVisitResult.CONTINUE;         
    }
    /* We don't use these, so just override them */
    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
    throws IOException {
        return FileVisitResult.CONTINUE;
    }
    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    
    public static void main(String args[]) throws IOException {
        
        /* A sample regex pattern for testing */
        String searchPattern="[a-zA-Z]";
        regexSearch walk = new regexSearch(searchPattern);        
        
        /* Define ZIP File System Properies in HashMap */
        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);
        
        Iterable<Path> dirs = zipfs.getRootDirectories();
        /* For every root folder inside the ZIP archive */
        for (Path root : dirs) {                
                Files.walkFileTree(root, walk);            
        }        
    }    
}


Search ZIP File -Glob Pattern -Java NIO Example

In this example, we will describe how to search a ZIP Archive with a Glob / Wild Card pattern, using a Java NIO program. We will read the ZIP File using Zip File System Provider, and use java.nio.file.PathMatcher to match the pattern we want to search with the contents of the ZIP file. This article is built on top of the search ZIP Entries by File Name tutorial. You may wish to refer to the tutorial to get a base on this approach.

Create ZIP File System / Scan ZIP Folders

We have discussed this step quite a large number of times in this blog now. We create a ZPFS using the methods available in Java NIO, and mount the ZIP file that we would like to search on as a File System. Then, we get all the folders inside the ZIP file using getRootDirectories method. We iterate each of those, and call the walkFileTree method during this iteration. This will result in scanning of individual files, which can grab under visitFile method implemented by FileVisitor interface. Inside this method, we can use PathMatcher object to match the scanned file with the Glob pattern.

        /* Define ZIP File System Properies in HashMap */
        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);
        
        Iterable<Path> dirs = zipfs.getRootDirectories();
        
        for (Path root : dirs) {                
                Files.walkFileTree(root, walk);            
        }        


Searching ZIP Archives Using java.nio.file.PathMatcher

PathMatcher class in Java NIO is capable of accepting a search pattern, and compare it with a Path object in Java and return a true boolean back to the calling program if there is a match. To create a PathMatcher pattern, we use getPathMatcher method available in java.nio.file.FileSystem and pass the search pattern to this method. Note that it is also possible to send a regular expression to getPathMatcher method. Using this, we will be able to search a ZIP archive for files that matches the string in regular expression. We will cover that in detail in another post. For now, we will focus on glob based ZIP entry searches alone. The code snippet to create a PathMatcher object is shown below:

       matcher = FileSystems.getDefault().getPathMatcher("glob:" + searchPattern);

The string searchPattern, for this example, we will set to *.sql, that will find matching .sql files inside the ZIP archive.


Match ZIP File Entries with GLOB Pattern

While scanning every single ZIP entry in the archive, we will get the name of the file scanned, and the Path of the file inside the ZIP Archive. This Path can then be matched with PathMatcher object created earlier, using matches method. If this method returns true, then we have found a match in the ZIP archive. Else, we can scan the next file. This needs to be recursively executed, till all the files are scanned in the ZIP file. This recursive operation is automatically managed by the walkFileTree method, which we discussed in the previous tutorial.

            if (name != null && matcher.matches(name)) {
                    System.out.println("Searched file was found: " + name + " in " + my_file.toRealPath().toString());
            }


Complete Java NIO Program to Search ZIP File Entries with GLOB Pattern

The complete Java program for this tutorial is given below:

import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.*;
import java.io.IOException;
import java.util.*;
import java.net.URI;

class globSearch implements FileVisitor {
   
    private final PathMatcher matcher;
  
    public globSearch(String searchPattern) {
       matcher = FileSystems.getDefault().getPathMatcher("glob:" + searchPattern);
    }
    
    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
    throws IOException {
            Path my_file = (Path) file;
            Path name = my_file.getFileName();
            if (name != null && matcher.matches(name)) {
                    System.out.println("Searched file was found: " + name + " in " + my_file.toRealPath().toString());
            }
            return FileVisitResult.CONTINUE;         
    }
    /* We don't use these, so just override them */
    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
    throws IOException {
        return FileVisitResult.CONTINUE;
    }
    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    
    public static void main(String args[]) throws IOException {
        
        /* We want to find out all .SQL files inside the ZIP file */
        String searchPattern="*.sql";
        globSearch walk = new globSearch(searchPattern);        
        
        /* Define ZIP File System Properies in HashMap */
        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);
        
        Iterable<Path> dirs = zipfs.getRootDirectories();
        /* For every root folder inside the ZIP archive */
        for (Path root : dirs) {                
                Files.walkFileTree(root, walk);            
        }        
    }    
}

For a ZIP File structure as shown in the screenshot below:

Search ZIP File using Wild Cards / Glob - Input File
Search ZIP File using Wild Cards / Glob - Input File

The output of this program is shown  below:

Searched file was found: dest.sql in /dest.sql
Searched file was found: mira.sql in /sp/sp2/mira.sql
Searched file was found: final.sql in /sp/sp2/final.sql
Searched file was found: dest.sql in /sp/dest.sql

Not only this, you can also specify different GLOB  patterns inside the ZIP file for a wide variety of search outcomes. Some of these are documented in the table below:

Pattern
Search Output
*.sql
Matches all .sql files inside the ZIP archive
*. {sql,log }
Matches all files with extension .sql or .log
fin*.sql
Find all .sql files with file name starting with “fin”
fin.?
Find all files with a single character extension and has “fin” as file name.
*fi*
Find all files that has “fi” in any part of their file name.


That completes our tutorial to search ZIP file with a GLOB pattern. In the next example, we will discuss how to use regular expressions to search for ZIP file entries in Java NIO. Send us a comment to tell us how we are doing / if you have any questions on this tutorial.

Search ZIP File using Java - NIO ZPFS Example

In this tutorial, we will explore the idea of searching a ZIP Archive for a specific file using Java, with an example program. Instead of the traditional java.util.zip ZipEntry approach, we will use NIO and ZPFS (Zip File System) to mount a zip file as a file system. After this we will use the walkFileTree method available in NIO to scan the ZIP file tree and locate the file we want. Sounds interesting? NIO offers superfast approaches to probe your Zip Archive. The different steps involved in this tutorial is captured  and explained in the sections to follow:

Searching using FileVisitor Interface:

In order to search any directory or file system recursively, the Java class we are writing should implement  / override the methods provided by FileVisitor interface. If you can recollect, we did the same during our recursive directory listing using NIO example also.

To query a ZIP archive, we have to override the visitFile method and examine the incoming entry from the ZIP File, to see if it matches the file we want to search. We continue if there is no match. Otherwise, we terminate the program for a match and provide where the file is located in the ZIP archive.


There is no need to override any other methods for this example.

Read File Name to be Searched:

We open a Path object in this step and accept the file that needs to be located in the ZIP file. In our case, the file we want to locate is dest.sql. This is available in two places in our sample ZIP file. Refer to the screenshot below:
Search ZIP File using ZPFS / NIO / Java - Example
Search ZIP File using ZPFS / NIO / Java - Example
One file is located in the root folder and the other one is located in “sp” folder. We want the program to spot both the locations and dump them in the output. Full code is provided at the end of this tutorial.


Create ZIP File System:

We have to create a ZIP File System in order to probe the file properly. We have discussed the steps to create ZIP File system quite a lot in this blog.  A code snippet to create ZPFS in Java is shown below:

        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);


Get root directories to Search:

Once the file system is created, we use getRootDirectories method to get a list of all root paths inside the ZIP file on which we have to search for the input file. We iteratively search all the root directories, folders and sub folders in them recursively, and locate the occurrence of our input file in them. Note that we match file only by name in this tutorial.

The search is done using the walkFileTree method available in java.nio.file.Files class. We use a public boolean variable to set a flag if a match is found. If no match is found at the end of the search, we output a message to the user that the searched file is not available in the ZIP archive.


Search ZIP Archive for File – Complete Java NIO ZPFS Program

The complete Java program that recursively searches the entire ZIP archive for a file is shown below:

import java.nio.file.attribute.BasicFileAttributes;
import java.nio.file.*;
import java.io.IOException;
import java.util.*;
import java.net.URI;

class Search implements FileVisitor {
    /* This flag holds the Path to the searched file */
    private final Path searchedFile;
    /* This flag is set to true if the file is found */
    public boolean file_found_flag;
  
    public Search(Path searchedFile) {
       this.searchedFile = searchedFile;
       this.file_found_flag = false;
    }
   
    
    @Override
    public FileVisitResult visitFile(Object file, BasicFileAttributes attrs)
    throws IOException {     
        Path incoming_file=(Path) file;
        Path name = incoming_file.getFileName();
        String filename=name.toString();
        String source=searchedFile.toString();
        if (name != null && filename.equals(source)) {
            System.out.println("ZIP File Contains " + searchedFile +
            " at " + incoming_file.toRealPath().toString());
            file_found_flag = true;
        }
        
        if (!file_found_flag) {
            return FileVisitResult.CONTINUE;
            } else {
           // Terminates search on first match. set this to CONTINUE to find all matches 
            return FileVisitResult.TERMINATE;
        } 
    }
    /* We don't use these, so just override them */
    @Override
    public FileVisitResult postVisitDirectory(Object dir, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    @Override
    public FileVisitResult preVisitDirectory(Object dir, BasicFileAttributes attrs)
    throws IOException {
        return FileVisitResult.CONTINUE;
    }
    @Override
    public FileVisitResult visitFileFailed(Object file, IOException exc)
    throws IOException {        
        return FileVisitResult.CONTINUE;
    }
    
    
    public static void main(String args[]) throws IOException {
        
        /* The file that needs to be searched inside the ZIP File */
        Path searchFile = Paths.get("dest.sql");
        Search walk = new Search(searchFile);        
        
        /* Define ZIP File System Properies in HashMap */
        Map<String, String> zip_properties = new HashMap<>();
        zip_properties.put("create", "false");
        URI zip_disk = URI.create("jar:file:/sp.zip");
        FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties);
        
        Iterable<Path> dirs = zipfs.getRootDirectories();
        
        for (Path root : dirs) {
            if (!walk.file_found_flag) {               
                Files.walkFileTree(root, walk);
            }
        }
        if (!walk.file_found_flag) {
            System.out.println("ZIP File does not contain " + searchFile );
        }
    }
    
}

You need to change FileVisitResult.TERMINATE in visitFile method to CONTINUE if you want to list all the matches. The output of this program is shown below:

ZIP File Contains dest.sql at /dest.sql
ZIP File Contains dest.sql at /sp/dest.sq

Viola! You created a program to search the ZIP file in Java. You can enhance this program to perform a file search based on regular expressions or wild cards also. All inside a ZIP file. This NIO + ZPFS combination opens a plethora of searching capabilities to your zip file with very less amount of coding involvement. We will discuss more examples of searching ZIP file in the upcoming posts. If you have any questions on this post in the meantime, you can give us a shout through a comment.

Rename ZIP entries with NIO / ZPFS - Java Example

In this example, we will discuss how to rename entries inside a ZIP file in Java using Zip File System Provider (ZPFS) implementation. ZPFS concept is linked to Java NIO and you will wonder how easily this task can be accomplished with very less coding. Without ZPFS, you will have to extract the entire archive, work harder to rename individual files and then create another ZIP file completely. This gets really complicated, if the size of your zip file is huge or if the number of entries inside the file is massive.  ZPFS overcomes all these hassles by treating your input file as a file system. With this introduction, we provide a step by step guide below for this tutorial:

Create Input ZIP File:

In this step, we create an input ZIP File that we will use inside our Java program. This file has an entry “dest.sql” which we will rename to “dest_2.sql” shortly. A screenshot of the input file is shown below:

Input ZIP File - Rename ZIP Entries in Java
Input ZIP File - Rename ZIP Entries in Java
Access ZIP File – Create ZIP File System

We have discussed this step in our tutorial on adding files to existing ZIP archive. We need to create a ZIP File System for the file that we want to act on. The sample code is shown below:


        /* Define ZIP File System Properies in HashMap */    
        Map<String, String> zip_properties = new HashMap<>(); 
        /* We want to read an existing ZIP File, so we set this to False */
        zip_properties.put("create", "false"); 
        
        /* Specify the path to the ZIP File that you want to read as a File System */
        URI zip_disk = URI.create("jar:file:/my_zip_file.zip");
        
        /* Create ZIP file System */
        try (FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties)) {


Read ZIP Entry to Rename

Once you have created a ZPFS, all you need to do is to create a Path object that points to the file that you want to rename. (in our case, we want to rename dest.sql). The Java code snippet is provided below:

            /* Access file that needs to be renamed */
            Path pathInZipfile = zipfs.getPath("dest.sql");


Specify new File Name for the ZIP Entry

You will need to specify the new file name through a Path object as shown below:


            /* Specify new file name */
            Path renamedZipEntry = zipfs.getPath("dest_4.sql");

Execute Rename:

In this step, we use the “move” method in java.nio.files.Files class and specify the copyOption as ATOMIC_MOVE, to do the rename. It  is just a one line code and that will rename the ZIP entry. Bang!


            /* Execute rename */
            Files.move(pathInZipfile,renamedZipEntry,StandardCopyOption.ATOMIC_MOVE);

Complete Java Program to rename ZIP Entries:

Putting it all together, the complete Java program that uses ZPFS / NIO to rename ZIP entries is shown below:

import java.util.*;
import java.net.URI;
import java.nio.file.Path;
import java.nio.file.*;
import java.nio.file.StandardCopyOption;
public class ZPFSRename {
    public static void main(String [] args) throws Exception {
        
        /* Define ZIP File System Properies in HashMap */    
        Map<String, String> zip_properties = new HashMap<>(); 
        /* We want to read an existing ZIP File, so we set this to False */
        zip_properties.put("create", "false"); 
        
        /* Specify the path to the ZIP File that you want to read as a File System */
        URI zip_disk = URI.create("jar:file:/my_zip_file.zip");
        
        /* Create ZIP file System */
        try (FileSystem zipfs = FileSystems.newFileSystem(zip_disk, zip_properties)) {
            /* Access file that needs to be renamed */
            Path pathInZipfile = zipfs.getPath("dest.sql");
            /* Specify new file name */
            Path renamedZipEntry = zipfs.getPath("dest_4.sql");
            System.out.println("About to rename an entry from ZIP File" + pathInZipfile.toUri() ); 
            /* Execute rename */
            Files.move(pathInZipfile,renamedZipEntry,StandardCopyOption.ATOMIC_MOVE);
            System.out.println("File successfully renamed");   
        } 
    }
}

The output ZIP file screenshot is provided below. You can see that the file is renamed successfully

Output ZIP File with Renamed ZIP Entries
Output ZIP File with Renamed ZIP Entries
That completes the Java example to rename ZIP file entries using NIO / ZPFS. As you can see, we have not extracted the ZIP file at all! ZPFS makes it really easy to manipulate ZIP archives.

Search Word Document Using Java Example

In this tutorial, we will examine the possibility to search  a Microsoft Word document  for a text / pattern, using a Java program. We will use Apache POI to parse the Word document and java.util.Scanner to perform a string search, and report the number of occurrences of the input pattern as the output. This tutorial is targeted for beginners, and you can feel free to extend the code provided in this post for your project requirements. The list of steps for this guide is captured below:

Step by Step Guide to Search Word Document in Java
Step by Step Guide to Search Word Document in Java

1.Read Word Document in Java


In order to search the contents of a Word document, it is required to read the document as an object of type java.io.FileInputStream first. The code to do this is straight forward, and is shown below:

                /* Create a FileInputStream object to read the input MS Word Document */
                FileInputStream input_document = new FileInputStream(new File("test_document.doc")); 


2. Parse Document Text Using Apache POI


In this step, we will use WordExtractor, defined in org.apache.poi.hwpf.extractor.WordExtractor to extract the contents of the Word document. To create an object of type WordExtractor, we will pass the FileInputStream object, created in Step – 1. Apache POI has made this class available for all Word to Text conversion necessities. The code to initialize the WordExtractor object is shown below:


                /* Create Word Extractor object */
                WordExtractor my_word=new WordExtractor(input_document);                

 3.Create Scanner / Define Search Pattern


Once you have created the WordExtractor object, you can pass the entire Word document text to the Scanner class, defined in java.util.Scanner by passing the document text as a string, using getText method in WordExtractor. You should define the search pattern at this stage using java.util.regex.Pattern class. This also gives you the power to use regular expressions in your search. For now, let us focus on a simple example. We will count the number of times the word “search” is present in our test document.  The Java code is provided below:

                /* Create Scanner object */             
                Scanner document_scanner = new Scanner(my_word.getText());
                /* Define Search Pattern - we find for the word  "search" */
                Pattern words = Pattern.compile("(search)");


4.Perform Search / Find All Matches


Using the Scanner object created earlier, we iteratively loop through the document text using hasNextLine method. While scanning every line, we use findInLine method and pass the pattern created earlier to see if the search filter is present in the scanned line. We increment the match counter by 1 for a match, otherwise we scan the next line. The search word can be found more than once within a same line, so we use the next method in scanner object to match all occurrences.

                /* Scan through every line */
                while (document_scanner.hasNextLine()) {
                        /* search for the pattern in scanned line */
                        key = document_scanner.findInLine(words);                       
                        while (key != null) {
                                /* Find all matches in the same line */
                                document_scanner.next();                
                                /* Increment counter for the match */
                                count ++;
                                key = document_scanner.findInLine(words);
                        }
                        /* Scan next line in document */
                        document_scanner.nextLine();
                }

5.Output Search Result


You are now ready to output the search result at this stage. It is just a print of the number of times the match was found, using a simple SOP statement.

                        /* Print number of times the search pattern was found */
                        System.out.println("Found Input "+ count + " times");


Search Word Document using Java – Complete Program


The complete code to implement a search functionality in Microsoft Word documents using Java language is shown below. You can treat this as a prototype and extend it further for any search needs.

import java.io.FileInputStream;
import java.io.*;
import org.apache.poi.hwpf.extractor.WordExtractor;
import java.util.Scanner;
import java.util.regex.Pattern;
import java.util.regex.MatchResult;
public class searchWord {  
        public static void main(String[] args) throws Exception{
                /* Create a FileInputStream object to read the input MS Word Document */
                FileInputStream input_document = new FileInputStream(new File("test_document.doc"));
                /* Create Word Extractor object */
                WordExtractor my_word=new WordExtractor(input_document);                
                input_document.close();
                /* Create Scanner object */             
                Scanner document_scanner = new Scanner(my_word.getText());
                /* Define Search Pattern - we find for the word  "search" */
                Pattern words = Pattern.compile("(search)");
                String key;
                int count=0;
                /* Scan through every line */
                while (document_scanner.hasNextLine()) {
                        /* search for the pattern in scanned line */
                        key = document_scanner.findInLine(words);                       
                        while (key != null) {
                                /* Find all matches in the same line */
                                document_scanner.next();                
                                /* Increment counter for the match */
                                count ++;
                                key = document_scanner.findInLine(words);
                        }
                        /* Scan next line in document */
                        document_scanner.nextLine();
                }
                        document_scanner.close();
                        /* Print number of times the search pattern was found */
                        System.out.println("Found Input "+ count + " times");
                
                }
}

I tried this example on my word document and it printed the matching count accurately back on the screen. Give a try, and let us know if you are stuck. Note that to compile this program, would need poi-3.8.jar or equivalent version.You also require poi-scratchpad-3.8.jar. You can download both these from Apache POI distribution.