Home > 157.240, Open Source > PDF Split and Merge (PDFsam)

PDF Split and Merge (PDFsam)

I was first introduced to PDFsam when I was given the task of splitting about 300 conference proceedings into individual session talks, totaling around about 5500 individual sessions. The splitting of the document needed to be exact, as the sessions were being uploaded individually into a different site. I worked out that I could split an individual PDF, with around about 10 minutes, give or take, depending on how much content was in each session. I figured there must have been an easier and quicker way to split the documents into individual arguments.

Without spending very much or no money at all for a solution, I decided to turn to Google to find a solution. “open source PDF splitter” came to mind and this was the first set of keywords that I had entered that day. The first result that came up was that of “PDF split and merge”. I clicked on the link, had a read over the site, then downloaded and printed off the instruction manual.

The slogan of  “A free and open source tool to split and merge PDF documents” meant that I had found my solution. I downloaded the Windows version and installed it onto my machine. On first run, it looked as though it could only split one document in multiple places. I worked out that this would shave roughly another 2 or 3 minutes off the processing time for each proceeding.

In the documentation I had downloaded for the software, there was a mention of a command line interface. This meant that I could create a text file with a command that would then split the proceeding into individual articles. THe command that I used was:

java -jar pdfsam-console-0.7.3.jar -f C:\file1.pdf -f C:\file1.pdf -o C:\output.pdf -u 1-1:2-8: concat

Explanation of the above command:

  • java – jar pdfsam-console-0.7.3.jar – the version of PDFsam that I was using at the time. Java is one of the requirements needed for PDFsam to run.
  • -f – the prefix used in the command line to indicate the input file (i.e. the file that was needing to be split)
  • -o – the prefix used in the command line to indicate the output file (i.e. the split file or finished product)
  • 1-1 – the range of pages needed to be taken from the first document (in this case it was the title page of the proceeding)
  • 2-8 – the range of pages that the session or talk spanned within the proceeding document
  • concat – the command used within the command line to combine the two individual pdf’s together (in my case, this was the command that I used to combine the title page with the pages that related to the session of the conference).

Using this command for each session in each proceeding, meant that I could split one proceeding in around about 15 seconds (depending on the amount of talks within each conference). This meant overall that the daunting process of splitting 300 conference proceedings was sped up, and the documents could be implemented into the new system in a timely manner.

About PDFsam:

  • PDFsam is freely available for download from Sourceforge (http://sourceforge.net/projects/pdfsam/) or from the PDFsam website (www.pdfsam.org)
  • It requires Java to run
  • Versions available for Windows and MacOS.
  • Source code is freely available
  • Support forum and wiki available on the website as well as the documentation

To conclude, I would definitely recommend using PDFsam to split and merge a wide variety of PDF documents. It is very easy to configure and use and sped up my processing time dramatically.

 

  1. No comments yet.
  1. No trackbacks yet.

Leave a comment