A client of mine wanted to use the power of Oracle Text to search for keywords in BLOBs stored within their database, using APEX as the front-end.
I’d never worked with Oracle Text before, and did not find too many posts about it’s integration with APEX, so I thought I’d write up how I went about it, as it may be of interest to others.
Step 1: Privileges for schema owner
The schema owner where your app resides will need some grants before being able to use the Oracle Text packages. Ensure you’ve granted the following:
GRANT CTXAPP to
GRANT EXECUTE ON CTXSYS.CTX_CLS TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_DDL TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_DOC TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_OUTPUT TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_QUERY TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_REPORT TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_THES TO [schema-owner];
GRANT EXECUTE ON CTXSYS.CTX_ULEXER TO [schema-owner];
Step 2: Create the index on your BLOB table
create index hdocsx on [table-name]([blob-column]) indextype is ctxsys.context parameters ('filter ctxsys.auto_filter');
I used the auto-filter parameter as this is recommended for PDF documents, which is what we happened to be working with. Read the Oracle documentation to ensure you use the filter that is right for your requirements.
Important: ensure your table to be indexed has a primary key defined before you create your index or it will be unusable for some of the Oracle Text functions such as MARKUP or SNIPPET, for example.
Step 3: Create an APEX report page, with a regular Report Region on your BLOB table
You can now include Oracle Text functions such as CONTAINS. For example, with a blob field ‘CONTENTS’, and a form search field ‘:P1_SEARCH’, you could use
select file_name, doc_id, xxx, yyy
from BLOB_TABLE
where CONTAINS(contents,’:P1_SEARCH’)>0
Such a query would return any documents that contain the word or search term the user entered in the P1_SEARCH field. So simple, yet so powerful!
Declarative support for BLOBS allows you to easily include a file download link for your documents.
Step 4: Create a marked up version of the document
My client wanted us to highlight the search terms within the document so that the user could easily see why the document was returned in the search results. To do this, I created a popup form that would open when the user would click on a ‘Preview’ link.
The popup form had a single Dynamic PL/SQL region, and a single (hidden) item containing the document’s ID. The PL/SQL region had the following Source:
declare v_clob_selected CLOB; v_read_amount integer; v_read_offset integer; v_buffer varchar2(32767); v_query varchar(2000); v_cursor integer; begin htp.p('<b>HTML version with highlighted terms</b>'); begin ctx_doc.markup (index_name => 'IDX_DOCS', -- name of Oracle Text index created in Step 2 textkey => : P5_ID, -- hidden item on page containing ID (primary key) of BLOB table text_query => : P1_SEARCH, -- item containing the term the user searched for on page 1 restab => v_clob_selected, starttag => '<i><font color=red>', -- I highlighted the terms in red, feel free to use any other style! endtag => '</font></i>'); v_read_amount := 32767; v_read_offset := 1; begin loop dbms_lob.read(v_clob_selected,v_read_amount,v_read_offset,v_buffer); htp.p(v_buffer); v_read_offset := v_read_offset + v_read_amount; v_read_amount := 32767; end loop; exception when no_data_found then null; end; exception when others then null; --showHTMLdoc(p_id); end; end;
As simple as that! This procedure converts the BLOB into HTML, with the search terms marked up. It does the best it can, and may not work well on ALL document types, but most of the PDFs we worked with rendered nicely.
One could also provide a ‘SNIPPET’ of information in the results list, using CTX_DOCS.SNIPPET, that would display a snippet containing the search term, so that a user could scan a larger result set without having to open each document.
More information on the Markup and Snippet functions can be found in Oracle Text Reference.
This is very interesting and my shop will be looking at this approach for a upcoming project … Thanks … I’ll document as the source if we go with it.
The WHEN OTHERS THEN null exception is a bit dangerous. I recognize this was for demo purposes so just mentioning for people who grab chunks of code off the web without looking. The WHEN OTHERS THEN null will cause the routine to act like it worked even when it did not so