org.gcube.application.framework.contentmanagement.datatransformation.util
Class DataTransformationUtils

java.lang.Object
  extended by org.gcube.application.framework.contentmanagement.datatransformation.util.DataTransformationUtils

public class DataTransformationUtils
extends java.lang.Object


Constructor Summary
DataTransformationUtils()
           
 
Method Summary
static java.util.ArrayList<DocumentInfos> getListOfFailuresFromReport(java.lang.String rsLocator, java.util.ArrayList<DocumentInfos> allDocuments, java.util.ArrayList<java.lang.String> collectionId)
          It parses the reports contained in the resultset, coming from DTS and returns the list of the document URIs that failed to be transformed.
static java.util.ArrayList<DocumentInfos> getReports(java.lang.String rsLocator, java.util.ArrayList<java.lang.String> collectionId)
           
static java.util.ArrayList<java.lang.String> performOCRtoPDF_HTTPInput(java.util.ArrayList<DocumentInfos> documents, java.lang.String outputCollectionId, ASLSession session)
          Transforms a list of PDF documents to text, using OCR Service.
static java.lang.String transformPDFDocumentsToText(java.lang.String listLocation, java.util.ArrayList<java.lang.String> collectionId, java.lang.String collectionName, java.lang.String scope)
          Transforms a list of PDF documents to Text documents, using DTS.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DataTransformationUtils

public DataTransformationUtils()
Method Detail

transformPDFDocumentsToText

public static java.lang.String transformPDFDocumentsToText(java.lang.String listLocation,
                                                           java.util.ArrayList<java.lang.String> collectionId,
                                                           java.lang.String collectionName,
                                                           java.lang.String scope)
                                                    throws ServiceEPRRetrievalException,
                                                           TransformationException
Transforms a list of PDF documents to Text documents, using DTS. It returns an RSLocator of the resultset containing the reports for the transformations.

Parameters:
listLocation - - the location of the file containing the document URIs
collectionId - - the output collection id requested (empty if a new collection is about to be created)
collectionName - - the name of the output collection id requested
scope -
Returns:
returns the rsLocator of the resultset, containing the reports from the transformation
Throws:
ServiceEPRRetrievalException
TransformationException

getListOfFailuresFromReport

public static java.util.ArrayList<DocumentInfos> getListOfFailuresFromReport(java.lang.String rsLocator,
                                                                             java.util.ArrayList<DocumentInfos> allDocuments,
                                                                             java.util.ArrayList<java.lang.String> collectionId)
                                                                      throws ReadingRSException
It parses the reports contained in the resultset, coming from DTS and returns the list of the document URIs that failed to be transformed.

Parameters:
rsLocator - - the RSLocator containing the reports from DTS
allDocuments - - list of all the documents that participated in the transformation attempt
collectionId - - empty list that needs to be filled with the id of the Collection Output
Returns:
the documents that failed to be transformed
Throws:
ReadingRSException

getReports

public static java.util.ArrayList<DocumentInfos> getReports(java.lang.String rsLocator,
                                                            java.util.ArrayList<java.lang.String> collectionId)
                                                     throws ReadingRSException
Throws:
ReadingRSException

performOCRtoPDF_HTTPInput

public static java.util.ArrayList<java.lang.String> performOCRtoPDF_HTTPInput(java.util.ArrayList<DocumentInfos> documents,
                                                                              java.lang.String outputCollectionId,
                                                                              ASLSession session)
                                                                       throws ServiceEPRRetrievalException,
                                                                              OCRException
Transforms a list of PDF documents to text, using OCR Service. It returns a list of the CM URIs of the output documents. It also copies the generated output to the collection given as a parameter.

Parameters:
documents - - the list of documents to be transformed
outpuCollectionId - - the collection to which the output will be inserted
session -
Returns:
- list of CM URIs of transformed documents
Throws:
ServiceEPRRetrievalException
OCRException