docs/modules/development/pages/support_scripts.adoc

   1 = Support Scripts =
   2 :toc:
   3
   4 Various scripts are included with Evergreen in the `/openils/bin/` directory
   5 (and in the source code in `Open-ILS/src/support-scripts` and
   6 `Open-ILS/src/extras`). Some of them are used during
   7 the installation process, such as `eg_db_config`, while others are usually
   8 run as cron jobs for routine maintenance, such as `fine_generator.pl` and
   9 `hold_targeter.pl`. Others are useful for less frequent needs, such as the
  10 scripts for importing/exporting MARC records. You may explore these scripts
  11 and adapt them for your local needs. You are also welcome to share your
  12 improvements or ask any questions on the
  13 http://evergreen-ils.org/communicate/[Evergreen IRC channel or email lists].
  14
  15 Here is a summary of the most commonly used scripts. The script name links
  16 to more thorough documentation, if available.
  17
  18  * action_trigger_aggregator.pl
  19    -- Groups together event output for already processed events.  Useful for
  20       creating files that contain data from a group of events.  Such as a CSV
  21       file with all the overdue data for one day.
  22  * xref:admin:actiontriggers_process.adoc#processing_action_triggers[action_trigger_runner.pl]
  23    -- Useful for creating events for specified hooks and running pending events
  24  * authority_authority_linker.pl
  25    -- Links reference headings in authority records to main entry headings
  26       in other authority records. Should be run at least once a day (only for
  27           changed records).
  28  * xref:#authority_control_fields[authority_control_fields.pl]
  29    -- Links bibliographic records to the best matching authority record.
  30       Should be run at least once a day (only for changed records).
  31       You can accomplish this by running _authority_control_fields.pl --days-back=1_
  32  * autogen.sh
  33    -- Generates web files used by the OPAC, especially files related to
  34       organization unit hierarchy, fieldmapper IDL, locales selection,
  35       facet definitions, compressed JS files and related cache key
  36  * clark-kent.pl
  37    -- Used to start and stop the reporter (which runs scheduled reports)
  38  * xref:installation:server_installation.adoc#creating_the_evergreen_database[eg_db_config]
  39    -- Creates database and schema, updates config files, sets Evergreen
  40       administrator username and password
  41  * fine_generator.pl
  42  * hold_targeter.pl
  43  * xref:#importing_authority_records_from_command_line[marc2are.pl]
  44    -- Converts authority records from MARC format to Evergreen objects
  45       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  46  * xref:#make_concerto_from_evergreen_db[make_concerto_from_evergreen_db.pl]
  47    -- This script is responsible for generating the enhanced concerto
  48       dataset from a live Evergreen database.
  49  * marc2bre.pl
  50    -- Converts bibliographic records from MARC format to Evergreen objects
  51       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  52  * marc2sre.pl
  53    -- Converts serial records from MARC format to Evergreen objects
  54       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  55  * xref:#marc_export[marc_export]
  56    -- Exports authority, bibliographic, and serial holdings records into
  57       any of these formats: USMARC, UNIMARC, XML, BRE, ARE
  58  * osrf_control
  59    -- Used to start, stop and send signals to OpenSRF services
  60  * parallel_pg_loader.pl
  61    -- Uses the output of marc2bre.pl (or similar tools) to generate the SQL
  62       for importing records into Evergreen in a parallel fashion
  63
  64 [#authority_control_fields]
  65
  66 == authority_control_fields: Connecting Bibliographic and Authority records ==
  67
  68 indexterm:[authority control]
  69
  70 This script matches headings in bibliographic records to the appropriate
  71 authority records. When it finds a match, it will add a subfield 0 to the
  72 matching bibliographic field.
  73
  74 Here is how the matching works:
  75
  76 [options="header",cols="1,1,3"]
  77 |=========================================================
  78 |Bibliographic field|Authority field it matches|Subfields that it examines
  79
  80 |100|100|a,b,c,d,f,g,j,k,l,n,p,q,t,u
  81 |110|110|a,b,c,d,f,g,k,l,n,p,t,u
  82 |111|111|a,c,d,e,f,g,j,k,l,n,p,q,t,u
  83 |130|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t
  84 |600|100|a,b,c,d,f,g,h,j,k,l,m,n,o,p,q,r,s,t,v,x,y,z
  85 |610|110|a,b,c,d,f,g,h,k,l,m,n,o,p,r,s,t,v,w,x,y,z
  86 |611|111|a,c,d,e,f,g,h,j,k,l,n,p,q,s,t,v,x,y,z
  87 |630|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t,v,x,y,z
  88 |648|148|a,v,x,y,z
  89 |650|150|a,b,v,x,y,z
  90 |651|151|a,v,x,y,z
  91 |655|155|a,v,x,y,z
  92 |700|100|a,b,c,d,f,g,j,k,l,n,p,q,t,u
  93 |710|110|a,b,c,d,f,g,k,l,n,p,t,u
  94 |711|111|a,c,d,e,f,g,j,k,l,n,p,q,t,u
  95 |730|130|a,d,f,g,h,j,k,m,n,o,p,r,s,t
  96 |751|151|a,v,x,y,z
  97 |800|100|a,b,c,d,e,f,g,j,k,l,n,p,q,t,u,4
  98 |830|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t
  99 |=========================================================
 100
 101 [#make_concerto_from_evergreen_db]
 102
 103 == make_concerto_from_evergreen_db.pl: Generating Evergreen enhanced datasets
 104
 105 This script makes it possible to continue to improve/maintain the Evergreen
 106 enhanced dataset. This script requires two databases. One database that is
 107 created with only Evergreen's seed data. And another one that contains the
 108 intended dataset for the enhanced concerto data.
 109
 110 === Setup your server environment ===
 111
 112 ==== Creating the non-seed database ====
 113
 114 [NOTE]
 115 Follow the standard PostgreSQL user creation steps from Evergreen's installation
 116 instructions.
 117
 118 [source,bash]
 119 ----
 120 perl Open-ILS/src/support-scripts/eg_db_config \
 121        --service all --create-database --create-schema \
 122        --user evergreen --password evergreen --hostname 127.0.0.1 --port 5432 \
 123        --database ref_db --admin-user admin --admin-pass demo123
 124 ----
 125
 126 ==== Creating the database containing the enhanced dataset ====
 127
 128 This will be the database containing "the" data that will ultimately be
 129 generated. In this example, we'll use the concerto dataset
 130
 131 [source,bash]
 132 ----
 133 perl Open-ILS/src/support-scripts/eg_db_config \
 134        --service all --create-database --create-schema --load-all-sample \
 135        --user evergreen --password evergreen --hostname 127.0.0.1 --port 5432 \
 136        --database evergreen --admin-user admin --admin-pass demo123
 137 ----
 138
 139 [NOTE]
 140 At this point, you can edit the Evergreen database via Staff client or direct DB.
 141
 142 === Run the script ===
 143
 144 Now that the database is ready to become the enhanced dataset, we need to make
 145 sure that we have the database connection details setup in two files. By default,
 146 this script will use /openils/conf/opensrf.xml for connection information to the
 147 enhanced dataset database. You will also need to provide a link to another
 148 XML file with the details on how to connect to the seed database.
 149
 150 [NOTE]
 151 Example of the minimum XML required for connection to the enhanced dataset database:
 152
 153 opensrf.xml
 154 [source,xml]
 155 ----
 156 <opensrf version='0.0.3'>
 157     <default>
 158         <apps>
 159                         <open-ils.storage>
 160                                 <app_settings>
 161                                         <databases>
 162                                                 <database>
 163                                                         <user>evergreen</user>
 164                                                         <host>127.0.0.1</host>
 165                                                         <port>5432</port>
 166                                                         <pw>evergreen</pw>
 167                                                         <db>evergreen</db>
 168                                                 </database>
 169                                         </databases>
 170                                 </app_settings>
 171                         </open-ils.storage>
 172         </apps>
 173     </default>
 174 </opensrf>
 175 ----
 176
 177
 178 [NOTE]
 179 Example of the minimum XML required for connection to the seed-only database:
 180
 181 seedonly.xml
 182 [source,xml]
 183 ----
 184 <opensrf version='0.0.3'>
 185     <default>
 186         <apps>
 187                         <open-ils.storage>
 188                                 <app_settings>
 189                                         <databases>
 190                                                 <database>
 191                                                         <user>evergreen</user>
 192                                                         <host>127.0.0.1</host>
 193                                                         <port>5432</port>
 194                                                         <pw>evergreen</pw>
 195                                                         <db>ref_db</db>
 196                                                 </database>
 197                                         </databases>
 198                                 </app_settings>
 199                         </open-ils.storage>
 200         </apps>
 201     </default>
 202 </opensrf>
 203 ----
 204
 205 And we're all set.
 206
 207 [source,bash]
 208 ----
 209 mkdir output
 210 ./make_concerto_from_evergreen_db.pl --xmlseed /openils/conf/seedonly.xml --output-folder output
 211 ----
 212
 213 This software needs to know what data is seed data and what data is not. It compares the data for each
 214 table against the seed-only database and determines what needs to be outputted.
 215
 216
 217 [#marc_export]
 218
 219 == marc_export: Exporting Bibliographic Records into MARC files ==
 220
 221 indexterm:[marc_export]
 222 indexterm:[MARC records,exporting,using the command line]
 223
 224 The following procedure explains how to export Evergreen bibliographic
 225 records into MARC files using the *marc_export* support script. All steps
 226 should be performed by the `opensrf` user from your Evergreen server.
 227
 228 [NOTE]
 229 Processing time for exporting records depends on several factors such as
 230 the number of records you are exporting. It is recommended that you divide
 231 the export ID files (records.txt) into a manageable number of records if
 232 you are exporting a large number of records.
 233
 234  . Create a text file list of the Bibliographic record IDs you would like
 235 to export from Evergreen. One way to do this is using SQL:
 236 +
 237 [source,sql]
 238 ----
 239 SELECT DISTINCT bre.id FROM biblio.record_entry AS bre
 240     JOIN asset.call_number AS acn ON acn.record = bre.id
 241     WHERE bre.deleted='false' and owning_lib=101 \g /home/opensrf/records.txt;
 242 ----
 243 +
 244 This query creates a file called `records.txt` containing a column of
 245 distinct IDs of items owned by the organizational unit with the id 101.
 246
 247  . Navigate to the support-scripts folder
 248 +
 249 ----
 250 cd /home/opensrf/Evergreen-ILS*/Open-ILS/src/support-scripts/
 251 ----
 252
 253  . Run *marc_export*, using the ID file you created in step 1 to define which
 254    files to export. The following example exports the records into MARCXML format.
 255 +
 256 ----
 257 cat /home/opensrf/records.txt | ./marc_export --store -i -c /openils/conf/opensrf_core.xml \
 258     -x /openils/conf/fm_IDL.xml -f XML --timeout 5 > exported_files.xml
 259 ----
 260
 261 [NOTE]
 262 ====================
 263 `marc_export` does not output progress as it executes.
 264 ====================
 265
 266 === Options ===
 267
 268 The *marc_export* support script includes several options.  You can find a complete list
 269 by running `./marc_export -h`.  A few key options are also listed below:
 270
 271 ==== --descendants and --library ====
 272
 273 The `marc_export` script has two related options, `--descendants` and
 274 `--library`.  Both options take one argument of an organizational unit
 275
 276 The `--library` option will export records with holdings at the specified
 277 organizational unit only.  By default, this only includes physical holdings,
 278 not electronic ones (also known as located URIs).
 279
 280 The `descendants` option works much like the `--library` option
 281 except that it is aware of the org. tree and will export records with
 282 holdings at the specified organizational unit and all of its descendants.
 283 This is handy if you want to export the records for all of the branches
 284 of a system.  You can do that by specifying this option and the system's
 285 shortname, instead of specifying multiple `--library` options for each branch.
 286
 287 Both the `--library` and `--descendants` options can be repeated.
 288 All of the specified org. units and their descendants will be included
 289 in the output.  You can also combine `--library` and `--descendants`
 290 options when necessary.
 291
 292 ==== --pipe ====
 293
 294 If you want to use the `--library` and `--descendants` options with a list
 295 of bib ids from standard input, you can make use of the `--pipe` option.
 296
 297 If you have a master list of bib ids, and only want to export bibs that have
 298 holdings from certain owning libraries then this option will help you reach
 299 that goal.
 300
 301 It will not work to combine `--all` or `--since` with `--pipe`.
 302
 303 ==== --items ====
 304
 305 The `--items` option will add an 852 field for every relevant item to the MARC
 306 record.  This 852 field includes the following information:
 307
 308 [options="header",cols="2,3"]
 309 |===================================
 310 |Subfield          |Contents
 311 |$b (occurrence 1) |Call number owning library shortname
 312 |$b (occurrence 2) |Item circulating library shortname
 313 |$c                |Shelving location
 314 |$g                |Circulation modifier
 315 |$j                |Call number
 316 |$k                |Call number prefix
 317 |$m                |Call number suffix
 318 |$p                |Barcode
 319 |$s                |Status
 320 |$t                |Copy number
 321 |$x                |Miscellaneous item information
 322 |$y                |Price
 323 |===================================
 324
 325
 326 ==== --since ====
 327
 328 You can use the `--since` option to export records modified after a certain date and time.
 329
 330 ==== --store ====
 331
 332 By default, marc_export will use the reporter storage service, which should
 333 work in most cases. But if you have a separate reporter database and you
 334 know you want to talk directly to your main production database, then you
 335 can set the `--store` option to `cstore` or `storage`.
 336
 337 ==== --uris ====
 338 The `--uris` option (short form: `-u`) allows you to  export records with
 339 located URIs (i.e. electronic resources).  When used by itself, it will export
 340 only records that have located URIs.  When used in conjunction with `--items`,
 341 it will add records with located URIs but no items/copies to the output.
 342 If combined with a `--library` or `--descendants` option, this option will
 343 limit its output to those records with URIs at the designated libraries.  The
 344 best way to use this option is in combination with the `--items` and one of the
 345 `--library` or `--descendants` options to export *all* of a library's
 346 holdings both physical and electronic.
 347
 348 [#pingest_pl]
 349
 350 == Parallel Ingest with pingest.pl ==
 351
 352 indexterm:[pgingest.pl]
 353 indexterm:[MARC records,importing,using the command line]
 354
 355 A program named pingest.pl allows fast bibliographic record
 356 ingest.  It performs ingest in parallel so that multiple batches can
 357 be done simultaneously.  It operates by splitting the records to be
 358 ingested up into batches and running all of the ingest methods on each
 359 batch.  You may pass in options to control how many batches are run at
 360 the same time, how many records there are per batch, and which ingest
 361 operations to skip.
 362
 363 NOTE: The browse ingest is presently done in a single process over all
 364 of the input records as it cannot run in parallel with itself.  It
 365 does, however, run in parallel with the other ingests.
 366
 367 === Command Line Options ===
 368
 369 pingest.pl accepts the following command line options:
 370
 371 --host::
 372     The server where PostgreSQL runs (either host name or IP address).
 373     The default is read from the PGHOST environment variable or
 374     "localhost."
 375
 376 --port::
 377     The port that PostgreSQL listens to on host.  The default is read
 378     from the PGPORT environment variable or 5432.
 379
 380 --db::
 381     The database to connect to on the host.  The default is read from
 382     the PGDATABASE environment variable or "evergreen."
 383
 384 --user::
 385     The username for database connections.  The default is read from
 386     the PGUSER environment variable or "evergreen."
 387
 388 --password::
 389     The password for database connections.  The default is read from
 390     the PGPASSWORD environment variable or "evergreen."
 391
 392 --batch-size::
 393     Number of records to process per batch.  The default is 10,000.
 394
 395 --max-child::
 396     Max number of worker processes (i.e. the number of batches to
 397     process simultaneously).  The default is 8.
 398
 399 --skip-browse::
 400 --skip-attrs::
 401 --skip-search::
 402 --skip-facets::
 403 --skip-display::
 404     Skip the selected reingest component.
 405
 406 --attr::
 407     This option allows the user to specify which record attributes to reingest.
 408 It can be used one or more times to specify one or more attributes to
 409 ingest.  It can be omitted to reingest all record attributes.  This
 410 option is ignored if the `--skip-attrs` option is used.
 411 +
 412 The `--attr` option is most useful after doing something specific that
 413 requires only a partial ingest of records.  For instance, if you add a
 414 new language to the `config.coded_value_map` table, you will want to
 415 reingest the `item_lang` attribute on all of your records.  The
 416 following command line will do that, and only that, ingest:
 417 +
 418 ----
 419 $ /openils/bin/pingest.pl --skip-browse --skip-search --skip-facets \
 420     --skip-display --attr=item_lang
 421 ----
 422
 423 --rebuild-rmsr::
 424     This option will rebuild the `reporter.materialized_simple_record`
 425 (rmsr) table after the ingests are complete.
 426 +
 427 This option might prove useful if you want to rebuild the table as
 428 part of a larger reingest.  If all you wish to do is to rebuild the
 429 rmsr table, then it would be just as simple to connect to the database
 430 server and run the following SQL:
 431 +
 432 [source,sql]
 433 ----
 434 SELECT reporter.refresh_materialized_simple_record();
 435 ----
 436
 437
 438
 439
 440 [#importing_authority_records_from_command_line]
 441 == Importing Authority Records from Command Line ==
 442
 443 indexterm:[marc2are.pl]
 444 indexterm:[pg_loader.pl]
 445 indexterm:[MARC records,importing,using the command line]
 446
 447 The major advantages of the command line approach are its speed and its
 448 convenience for system administrators who can perform bulk loads of
 449 authority records in a controlled environment. For alternate instructions,
 450 see the cataloging manual.
 451
 452  . Run *marc2are.pl* against the authority records, specifying the user
 453 name, password, MARC type (USMARC or XML). Use `STDOUT` redirection to
 454 either pipe the output directly into the next command or into an output
 455 file for inspection. For example, to process a file with authority records
 456 in MARCXML format named `auth_small.xml` using the default user name and
 457 password, and directing the output into a file named `auth.are`:
 458 +
 459 ----
 460 cd Open-ILS/src/extras/import/
 461 perl marc2are.pl --user admin --pass open-ils --marctype XML auth_small.xml > auth.are
 462 ----
 463 +
 464 [NOTE]
 465 The MARC type will default to USMARC if the `--marctype` option is not specified.
 466
 467  . Run *parallel_pg_loader.pl* to generate the SQL necessary for importing the
 468 authority records into your system. This script will create files in your
 469 current directory with filenames like `pg_loader-output.are.sql` and
 470 `pg_loader-output.sql` (which runs the previous SQL file). To continue with the
 471 previous example by processing our new `auth.are` file:
 472 +
 473 ----
 474 cd Open-ILS/src/extras/import/
 475 perl parallel_pg_loader.pl --auto are --order are auth.are
 476 ----
 477 +
 478 [TIP]
 479 To save time for very large batches of records, you could simply pipe the
 480 output of *marc2are.pl* directly into *parallel_pg_loader.pl*.
 481
 482  . Load the authority records from the SQL file that you generated in the
 483 last step into your Evergreen database using the psql tool. Assuming the
 484 default user name, host name, and database name for an Evergreen instance,
 485 that command looks like:
 486 +
 487 ----
 488 psql -U evergreen -h localhost -d evergreen -f pg_loader-output.sql
 489 ----
 490
 491 == Juvenile-to-adult batch script ==
 492
 493 The batch `juv_to_adult.srfsh` script is responsible for toggling a patron
 494 from juvenile to adult. It should be set up as a cron job.
 495
 496 This script changes patrons to adult when they reach the age value set in the
 497 library setting named "Juvenile Age Threshold" (`global.juvenile_age_threshold`).
 498 When no library setting value is present at a given patron's home library, the
 499 value passed in to the script will be used as a default.
 500
 501 == MARC Stream Importer ==
 502
 503 indexterm:[MARC records,importing,using the command line]
 504
 505 The MARC Stream Importer can import authority records or bibliographic records.
 506 A single running instance of the script can import either type of record, based
 507 on the record leader.
 508
 509 This support script has its own configuration file, _marc_stream_importer.conf_,
 510 which includes settings related to logs, ports, uses, and access control.
 511
 512 By default, _marc_stream_importer.pl_ will typically be located in the
 513 _/openils/bin_ directory. _marc_stream_importer.conf_ will typically be located
 514 in _/openils/conf_.
 515
 516 The importer is even more flexible than the staff client import, including the
 517 following options:
 518
 519  * _--bib-auto-overlay-exact_ and _--auth-auto-overlay-exact_: overlay/merge on
 520 exact 901c matches
 521  * _--bib-auto-overlay-1match_ and _--auth-auto-overlay-1match_: overlay/merge
 522 when exactly one match is found
 523  * _--bib-auto-overlay-best-match_ and _--auth-auto-overlay-best-match_:
 524 overlay/merge on best match
 525  * _--bib-import-no-match_ and _--auth-import-no-match_: import when no match
 526 is found
 527
 528 One advantage to using this tool instead of the staff client Import interface
 529 is that the MARC Stream Importer can load a group of files at once.
 530