We developed the Filestore Provider for a client in NZ who had a requirement to store hundreds of millions of documents. Early in the sales cycle, we had proposed I/PM because that already was able to deal with these kind of volumes, but the customer rejected it because it only ran on windows.
So we had to come up with some way to manage file storage for up to 200 million objects on the file system that made sense in terms of the underlying file system. Filestore Provider met that need because it allowed us to modify the rules governing where files were stored (in all previous versions, native files were stored in a path /vault/
Filestore provider lets us do a number of important things:
1. It lets us specify a different rule for the file paths - so we can embed metadata information in there to a) distribute the files more evenly across the file system, and b) leverage these paths for hierarchical storage management, backup, auditing, etc.
2. It lets us define different storage rules for different types of content - so web content can be stored on one file system, catalog pages on another.
3. It gives us the ability to specify a database storage rule and then store content on a database - either as BLOBs in Oracle 10g (not recommended) or as SecureFiles in 11g (much, much better).
4. We can specify rules that tell the system only to store one copy of some content and two of others - very important in large collections, where the duplication of files adds up really quickly.
Our client in NZ has been in production with FSP since 2005 on UCM version 7.1.1 and the FSP was made a core feature (and given an actual UI) for 10gR3. We tested with a 32TB file system and 200 million objects, but a well-planned storage system could easily be larger than this.
Some key points to remember:
- Most file systems have a maximum reasonable limit of 40-100,000 items per subdirectory or folder. Ignore theoretical maximums, research what is realistic. Sun engineers advised us to aim for no more than 40,000 files per subdirectory.
- You must, must, must keep the groups/
/ / string in the web viewable storage path rule - otherwise security checks from the web server will be bypassed. - You may need to create specific metadata values for the path to make sense (for instance we created a set of metadata values for year / month / day by breaking apart the dIndate) - the client was then able to use these paths to move files to lower cost disc after specified time intervals, simply by looking at the filepath.
- Use some kind of business logic to create the file paths - even if you aren't using it now. Some clients use substring calculations on dID and things like that. Why not use department/location/author or something that may actually be useful?
No comments:
Post a Comment