org.gcube.application.framework.contentmanagement.datatransformation.util
Class DataTransformationUtils
java.lang.Object
org.gcube.application.framework.contentmanagement.datatransformation.util.DataTransformationUtils
public class DataTransformationUtils
- extends java.lang.Object
Method Summary |
static java.util.ArrayList<DocumentInfos> |
getListOfFailuresFromReport(java.lang.String rsLocator,
java.util.ArrayList<DocumentInfos> allDocuments,
java.util.ArrayList<java.lang.String> collectionId)
It parses the reports contained in the resultset, coming from DTS and returns the list of the document URIs that failed to be transformed. |
static java.util.ArrayList<DocumentInfos> |
getReports(java.lang.String rsLocator,
java.util.ArrayList<java.lang.String> collectionId)
|
static java.util.ArrayList<java.lang.String> |
performOCRtoPDF_HTTPInput(java.util.ArrayList<DocumentInfos> documents,
java.lang.String outputCollectionId,
ASLSession session)
Transforms a list of PDF documents to text, using OCR Service. |
static java.lang.String |
transformPDFDocumentsToText(java.lang.String listLocation,
java.util.ArrayList<java.lang.String> collectionId,
java.lang.String collectionName,
java.lang.String scope)
Transforms a list of PDF documents to Text documents, using DTS. |
Methods inherited from class java.lang.Object |
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
DataTransformationUtils
public DataTransformationUtils()
transformPDFDocumentsToText
public static java.lang.String transformPDFDocumentsToText(java.lang.String listLocation,
java.util.ArrayList<java.lang.String> collectionId,
java.lang.String collectionName,
java.lang.String scope)
throws ServiceEPRRetrievalException,
TransformationException
- Transforms a list of PDF documents to Text documents, using DTS. It returns an RSLocator of the resultset containing the reports for the transformations.
- Parameters:
listLocation
- - the location of the file containing the document URIscollectionId
- - the output collection id requested (empty if a new collection is about to be created)collectionName
- - the name of the output collection id requestedscope
-
- Returns:
- returns the rsLocator of the resultset, containing the reports from the transformation
- Throws:
ServiceEPRRetrievalException
TransformationException
getListOfFailuresFromReport
public static java.util.ArrayList<DocumentInfos> getListOfFailuresFromReport(java.lang.String rsLocator,
java.util.ArrayList<DocumentInfos> allDocuments,
java.util.ArrayList<java.lang.String> collectionId)
throws ReadingRSException
- It parses the reports contained in the resultset, coming from DTS and returns the list of the document URIs that failed to be transformed.
- Parameters:
rsLocator
- - the RSLocator containing the reports from DTSallDocuments
- - list of all the documents that participated in the transformation attemptcollectionId
- - empty list that needs to be filled with the id of the Collection Output
- Returns:
- the documents that failed to be transformed
- Throws:
ReadingRSException
getReports
public static java.util.ArrayList<DocumentInfos> getReports(java.lang.String rsLocator,
java.util.ArrayList<java.lang.String> collectionId)
throws ReadingRSException
- Throws:
ReadingRSException
performOCRtoPDF_HTTPInput
public static java.util.ArrayList<java.lang.String> performOCRtoPDF_HTTPInput(java.util.ArrayList<DocumentInfos> documents,
java.lang.String outputCollectionId,
ASLSession session)
throws ServiceEPRRetrievalException,
OCRException
- Transforms a list of PDF documents to text, using OCR Service. It returns a list of the CM URIs of the output documents.
It also copies the generated output to the collection given as a parameter.
- Parameters:
documents
- - the list of documents to be transformedoutpuCollectionId
- - the collection to which the output will be insertedsession
-
- Returns:
- - list of CM URIs of transformed documents
- Throws:
ServiceEPRRetrievalException
OCRException