docs/modules/development/pages/support_scripts.adoc

   1 = Support Scripts =
   2 :toc:
   3
   4 Various scripts are included with Evergreen in the `/openils/bin/` directory
   5 (and in the source code in `Open-ILS/src/support-scripts` and
   6 `Open-ILS/src/extras`). Some of them are used during
   7 the installation process, such as `eg_db_config`, while others are usually
   8 run as cron jobs for routine maintenance, such as `fine_generator.pl` and
   9 `hold_targeter.pl`. Others are useful for less frequent needs, such as the
  10 scripts for importing/exporting MARC records. You may explore these scripts
  11 and adapt them for your local needs. You are also welcome to share your
  12 improvements or ask any questions on the
  13 http://evergreen-ils.org/communicate/[Evergreen IRC channel or email lists].
  14
  15 Here is a summary of the most commonly used scripts. The script name links
  16 to more thorough documentation, if available.
  17
  18  * action_trigger_aggregator.pl
  19    -- Groups together event output for already processed events.  Useful for
  20       creating files that contain data from a group of events.  Such as a CSV
  21       file with all the overdue data for one day.
  22  * xref:admin:actiontriggers_process.adoc#processing_action_triggers[action_trigger_runner.pl]
  23    -- Useful for creating events for specified hooks and running pending events
  24  * authority_authority_linker.pl
  25    -- Links reference headings in authority records to main entry headings
  26       in other authority records. Should be run at least once a day (only for
  27           changed records).
  28  * xref:#authority_control_fields[authority_control_fields.pl]
  29    -- Links bibliographic records to the best matching authority record.
  30       Should be run at least once a day (only for changed records).
  31       You can accomplish this by running _authority_control_fields.pl --days-back=1_
  32  * autogen.sh
  33    -- Generates web files used by the OPAC, especially files related to
  34       organization unit hierarchy, fieldmapper IDL, locales selection,
  35       facet definitions, compressed JS files and related cache key
  36  * clark-kent.pl
  37    -- Used to start and stop the reporter (which runs scheduled reports)
  38  * xref:installation:server_installation.adoc#creating_the_evergreen_database[eg_db_config]
  39    -- Creates database and schema, updates config files, sets Evergreen
  40       administrator username and password
  41  * fine_generator.pl
  42  * hold_targeter.pl
  43  * xref:#importing_authority_records_from_command_line[marc2are.pl]
  44    -- Converts authority records from MARC format to Evergreen objects
  45       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  46  * xref:#make_concerto_from_evergreen_db[make_concerto_from_evergreen_db.pl]
  47    -- This experimental script is responsible for generating the enhanced concerto
  48       dataset from a live Evergreen database.
  49  * marc2bre.pl
  50    -- Converts bibliographic records from MARC format to Evergreen objects
  51       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  52  * marc2sre.pl
  53    -- Converts serial records from MARC format to Evergreen objects
  54       suitable for importing via pg_loader.pl (or parallel_pg_loader.pl)
  55  * xref:#marc_export[marc_export]
  56    -- Exports authority, bibliographic, and serial holdings records into
  57       any of these formats: USMARC, UNIMARC, XML, BRE, ARE
  58  * osrf_control
  59    -- Used to start, stop and send signals to OpenSRF services
  60  * parallel_pg_loader.pl
  61    -- Uses the output of marc2bre.pl (or similar tools) to generate the SQL
  62       for importing records into Evergreen in a parallel fashion
  63
  64 [#authority_control_fields]
  65
  66 == authority_control_fields: Connecting Bibliographic and Authority records ==
  67
  68 indexterm:[authority control]
  69
  70 This script matches headings in bibliographic records to the appropriate
  71 authority records. When it finds a match, it will add a subfield 0 to the
  72 matching bibliographic field.
  73
  74 Here is how the matching works:
  75
  76 [options="header",cols="1,1,3"]
  77 |=========================================================
  78 |Bibliographic field|Authority field it matches|Subfields that it examines
  79
  80 |100|100|a,b,c,d,f,g,j,k,l,n,p,q,t,u
  81 |110|110|a,b,c,d,f,g,k,l,n,p,t,u
  82 |111|111|a,c,d,e,f,g,j,k,l,n,p,q,t,u
  83 |130|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t
  84 |600|100|a,b,c,d,f,g,h,j,k,l,m,n,o,p,q,r,s,t,v,x,y,z
  85 |610|110|a,b,c,d,f,g,h,k,l,m,n,o,p,r,s,t,v,w,x,y,z
  86 |611|111|a,c,d,e,f,g,h,j,k,l,n,p,q,s,t,v,x,y,z
  87 |630|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t,v,x,y,z
  88 |648|148|a,v,x,y,z
  89 |650|150|a,b,v,x,y,z
  90 |651|151|a,v,x,y,z
  91 |655|155|a,v,x,y,z
  92 |700|100|a,b,c,d,f,g,j,k,l,n,p,q,t,u
  93 |710|110|a,b,c,d,f,g,k,l,n,p,t,u
  94 |711|111|a,c,d,e,f,g,j,k,l,n,p,q,t,u
  95 |730|130|a,d,f,g,h,j,k,m,n,o,p,r,s,t
  96 |751|151|a,v,x,y,z
  97 |800|100|a,b,c,d,e,f,g,j,k,l,n,p,q,t,u,4
  98 |830|130|a,d,f,g,h,k,l,m,n,o,p,r,s,t
  99 |=========================================================
 100
 101 [#make_concerto_from_evergreen_db]
 102
 103 == make_concerto_from_evergreen_db.pl: Generating Evergreen enhanced datasets ==
 104
 105 This script makes it possible to continue to improve/maintain the Evergreen
 106 enhanced dataset. This script requires access to a Postgres database. It will
 107 automate the process of making the enhanced dataset match the current branch of
 108 Evergreen. You need to provide the login credentials to the database as well as
 109 a path to the Evergreen repository where you're currently on the intended branch.
 110
 111 This script has known bugs and should be considered experimental. Its output
 112 should be carefully reviewed before committing changes to to Evergreen or
 113 opening a pull request for updating the dataset.
 114
 115 === Generate new dataset from existing DB ===
 116
 117 This command will produce new output sql from an already-existing database.
 118 It requires that you've also pre-created a PG database representing the "seed"
 119 database. The seed database is an Evergreen database created without data but
 120 from the branch of Evergreen that matches the dataset's branch.
 121
 122 [source,bash]
 123 ----
 124 ./make_concerto_from_evergreen_db.pl \
 125 --db-host localhost \
 126 --db-user evergreen \
 127 --db-pass evergreen \
 128 --db-port 5432 \
 129 --db-name eg_enhanced \
 130 --output-folder output \
 131 --seed-db-name seed_from_1326 \
 132 --evergreen-repo /home/opensrf/repos/Evergreen
 133 ----
 134
 135 If you don't have a seed database, you can omit it, and the software will make one
 136 based upon the version we find in the file <output_folder>/config.upgrade_log.sql
 137
 138 [source,bash]
 139 ----
 140 ./make_concerto_from_evergreen_db.pl \
 141 --db-host localhost \
 142 --db-user evergreen \
 143 --db-pass evergreen \
 144 --db-port 5432 \
 145 --db-name eg_enhanced \
 146 --output-folder output \
 147 --evergreen-repo /home/opensrf/repos/Evergreen
 148 ----
 149
 150 Or, you can have this software make a seed DB, and that's all it will do.
 151 The version of Evergreen it will use will be found in <output_folder>/config.upgrade_log.sql
 152
 153 [source,bash]
 154 ----
 155 ./make_concerto_from_evergreen_db.pl \
 156 --db-host localhost \
 157 --db-user evergreen \
 158 --db-pass evergreen \
 159 --db-port 5432 \
 160 --output-folder output \
 161 --evergreen-repo /home/opensrf/repos/Evergreen \
 162 --create-seed-db
 163 ----
 164
 165 Or, you can have this software make a seed DB based on your specified version of Evergreen
 166
 167 [source,bash]
 168 ----
 169 ./make_concerto_from_evergreen_db.pl \
 170 --db-host localhost \
 171 --db-user evergreen \
 172 --db-pass evergreen \
 173 --db-port 5432 \
 174 --output-folder output \
 175 --evergreen-repo /home/opensrf/repos/Evergreen \
 176 --create-seed-db \
 177 --seed-from-egdbid 1350
 178 ----
 179
 180 === Upgrade a previously-created dataset ===
 181
 182 Use this when cutting new releases of Evergreen and you want to include
 183 the enhanced dataset to match. It will use the current git branch found in the provided path to the EG repo.
 184
 185 [source,bash]
 186 ----
 187 ./make_concerto_from_evergreen_db.pl \
 188 --db-host localhost \
 189 --db-user evergreen \
 190 --db-pass evergreen \
 191 --db-port 5432 \
 192 --output-folder output \
 193 --evergreen-repo /home/opensrf/repos/Evergreen \
 194 --perform-upgrade
 195 ----
 196
 197 === Test the existing dataset ===
 198
 199 Create a new database and restore the dataset.
 200 The software will first create a database that matches the version of Evergreen in the
 201 dataset output folder, then restore the dataset into the newly created database.
 202
 203 [source,bash]
 204 ----
 205 ./make_concerto_from_evergreen_db.pl \
 206 --db-host localhost \
 207 --db-user evergreen \
 208 --db-pass evergreen \
 209 --db-port 5432 \
 210 --output-folder output \
 211 --evergreen-repo /home/opensrf/repos/Evergreen \
 212 --test-restore
 213 ----
 214
 215 [#marc_export]
 216
 217 == marc_export: Exporting Bibliographic Records into MARC files ==
 218
 219 indexterm:[marc_export]
 220 indexterm:[MARC records,exporting,using the command line]
 221
 222 The following procedure explains how to export Evergreen bibliographic
 223 records into MARC files using the *marc_export* support script. All steps
 224 should be performed by the `opensrf` user from your Evergreen server.
 225
 226 [NOTE]
 227 Processing time for exporting records depends on several factors such as
 228 the number of records you are exporting. It is recommended that you divide
 229 the export ID files (records.txt) into a manageable number of records if
 230 you are exporting a large number of records.
 231
 232  . Create a text file list of the Bibliographic record IDs you would like
 233 to export from Evergreen. One way to do this is using SQL:
 234 +
 235 [source,sql]
 236 ----
 237 SELECT DISTINCT bre.id FROM biblio.record_entry AS bre
 238     JOIN asset.call_number AS acn ON acn.record = bre.id
 239     WHERE bre.deleted='false' and owning_lib=101 \g /home/opensrf/records.txt;
 240 ----
 241 +
 242 This query creates a file called `records.txt` containing a column of
 243 distinct IDs of items owned by the organizational unit with the id 101.
 244
 245  . Navigate to the support-scripts folder
 246 +
 247 ----
 248 cd /home/opensrf/Evergreen-ILS*/Open-ILS/src/support-scripts/
 249 ----
 250
 251  . Run *marc_export*, using the ID file you created in step 1 to define which
 252    files to export. The following example exports the records into MARCXML format.
 253 +
 254 ----
 255 cat /home/opensrf/records.txt | ./marc_export --store -i -c /openils/conf/opensrf_core.xml \
 256     -x /openils/conf/fm_IDL.xml -f XML --timeout 5 > exported_files.xml
 257 ----
 258
 259 [NOTE]
 260 ====================
 261 `marc_export` does not output progress as it executes.
 262 ====================
 263
 264 === Options ===
 265
 266 The *marc_export* support script includes several options.  You can find a complete list
 267 by running `./marc_export -h`.  A few key options are also listed below:
 268
 269 ==== --descendants and --library ====
 270
 271 The `marc_export` script has two related options, `--descendants` and
 272 `--library`.  Both options take one argument of an organizational unit
 273
 274 The `--library` option will export records with holdings at the specified
 275 organizational unit only.  By default, this only includes physical holdings,
 276 not electronic ones (also known as located URIs).
 277
 278 The `descendants` option works much like the `--library` option
 279 except that it is aware of the org. tree and will export records with
 280 holdings at the specified organizational unit and all of its descendants.
 281 This is handy if you want to export the records for all of the branches
 282 of a system.  You can do that by specifying this option and the system's
 283 shortname, instead of specifying multiple `--library` options for each branch.
 284
 285 Both the `--library` and `--descendants` options can be repeated.
 286 All of the specified org. units and their descendants will be included
 287 in the output.  You can also combine `--library` and `--descendants`
 288 options when necessary.
 289
 290 ==== --pipe ====
 291
 292 If you want to use the `--library` and `--descendants` options with a list
 293 of bib ids from standard input, you can make use of the `--pipe` option.
 294
 295 If you have a master list of bib ids, and only want to export bibs that have
 296 holdings from certain owning libraries then this option will help you reach
 297 that goal.
 298
 299 It will not work to combine `--all` or `--since` with `--pipe`.
 300
 301 ==== --items ====
 302
 303 The `--items` option will add an 852 field for every relevant item to the MARC
 304 record.  This 852 field includes the following information:
 305
 306 [options="header",cols="2,3"]
 307 |===================================
 308 |Subfield          |Contents
 309 |$b (occurrence 1) |Call number owning library shortname
 310 |$b (occurrence 2) |Item circulating library shortname
 311 |$c                |Shelving location
 312 |$g                |Circulation modifier
 313 |$j                |Call number
 314 |$k                |Call number prefix
 315 |$m                |Call number suffix
 316 |$p                |Barcode
 317 |$s                |Status
 318 |$t                |Copy number
 319 |$x                |Miscellaneous item information
 320 |$y                |Price
 321 |===================================
 322
 323
 324 ==== --since ====
 325
 326 You can use the `--since` option to export records modified after a certain date and time.
 327
 328 ==== --store ====
 329
 330 By default, marc_export will use the reporter storage service, which should
 331 work in most cases. But if you have a separate reporter database and you
 332 know you want to talk directly to your main production database, then you
 333 can set the `--store` option to `cstore` or `storage`.
 334
 335 ==== --uris ====
 336 The `--uris` option (short form: `-u`) allows you to  export records with
 337 located URIs (i.e. electronic resources).  When used by itself, it will export
 338 only records that have located URIs.  When used in conjunction with `--items`,
 339 it will add records with located URIs but no items/copies to the output.
 340 If combined with a `--library` or `--descendants` option, this option will
 341 limit its output to those records with URIs at the designated libraries.  The
 342 best way to use this option is in combination with the `--items` and one of the
 343 `--library` or `--descendants` options to export *all* of a library's
 344 holdings both physical and electronic.
 345
 346 [#pingest_pl]
 347
 348 == Parallel Ingest with pingest.pl ==
 349
 350 indexterm:[pgingest.pl]
 351 indexterm:[MARC records,importing,using the command line]
 352
 353 A program named pingest.pl allows fast bibliographic record
 354 ingest.  It performs ingest in parallel so that multiple batches can
 355 be done simultaneously.  It operates by splitting the records to be
 356 ingested up into batches and running all of the ingest methods on each
 357 batch.  You may pass in options to control how many batches are run at
 358 the same time, how many records there are per batch, and which ingest
 359 operations to skip.
 360
 361 NOTE: The browse ingest is presently done in a single process over all
 362 of the input records as it cannot run in parallel with itself.  It
 363 does, however, run in parallel with the other ingests.
 364
 365 === Command Line Options ===
 366
 367 pingest.pl accepts the following command line options:
 368
 369 --host::
 370     The server where PostgreSQL runs (either host name or IP address).
 371     The default is read from the PGHOST environment variable or
 372     "localhost."
 373
 374 --port::
 375     The port that PostgreSQL listens to on host.  The default is read
 376     from the PGPORT environment variable or 5432.
 377
 378 --db::
 379     The database to connect to on the host.  The default is read from
 380     the PGDATABASE environment variable or "evergreen."
 381
 382 --user::
 383     The username for database connections.  The default is read from
 384     the PGUSER environment variable or "evergreen."
 385
 386 --password::
 387     The password for database connections.  The default is read from
 388     the PGPASSWORD environment variable or "evergreen."
 389
 390 --batch-size::
 391     Number of records to process per batch.  The default is 10,000.
 392
 393 --max-child::
 394     Max number of worker processes (i.e. the number of batches to
 395     process simultaneously).  The default is 8.
 396
 397 --skip-browse::
 398 --skip-attrs::
 399 --skip-search::
 400 --skip-facets::
 401 --skip-display::
 402     Skip the selected reingest component.
 403
 404 --attr::
 405     This option allows the user to specify which record attributes to reingest.
 406 It can be used one or more times to specify one or more attributes to
 407 ingest.  It can be omitted to reingest all record attributes.  This
 408 option is ignored if the `--skip-attrs` option is used.
 409 +
 410 The `--attr` option is most useful after doing something specific that
 411 requires only a partial ingest of records.  For instance, if you add a
 412 new language to the `config.coded_value_map` table, you will want to
 413 reingest the `item_lang` attribute on all of your records.  The
 414 following command line will do that, and only that, ingest:
 415 +
 416 ----
 417 $ /openils/bin/pingest.pl --skip-browse --skip-search --skip-facets \
 418     --skip-display --attr=item_lang
 419 ----
 420
 421 --rebuild-rmsr::
 422     This option will rebuild the `reporter.materialized_simple_record`
 423 (rmsr) table after the ingests are complete.
 424 +
 425 This option might prove useful if you want to rebuild the table as
 426 part of a larger reingest.  If all you wish to do is to rebuild the
 427 rmsr table, then it would be just as simple to connect to the database
 428 server and run the following SQL:
 429 +
 430 [source,sql]
 431 ----
 432 SELECT reporter.refresh_materialized_simple_record();
 433 ----
 434
 435
 436
 437
 438 [#importing_authority_records_from_command_line]
 439 == Importing Authority Records from Command Line ==
 440
 441 indexterm:[marc2are.pl]
 442 indexterm:[pg_loader.pl]
 443 indexterm:[MARC records,importing,using the command line]
 444
 445 The major advantages of the command line approach are its speed and its
 446 convenience for system administrators who can perform bulk loads of
 447 authority records in a controlled environment. For alternate instructions,
 448 see the cataloging manual.
 449
 450  . Run *marc2are.pl* against the authority records, specifying the user
 451 name, password, MARC type (USMARC or XML). Use `STDOUT` redirection to
 452 either pipe the output directly into the next command or into an output
 453 file for inspection. For example, to process a file with authority records
 454 in MARCXML format named `auth_small.xml` using the default user name and
 455 password, and directing the output into a file named `auth.are`:
 456 +
 457 ----
 458 cd Open-ILS/src/extras/import/
 459 perl marc2are.pl --user admin --pass open-ils --marctype XML auth_small.xml > auth.are
 460 ----
 461 +
 462 [NOTE]
 463 The MARC type will default to USMARC if the `--marctype` option is not specified.
 464
 465  . Run *parallel_pg_loader.pl* to generate the SQL necessary for importing the
 466 authority records into your system. This script will create files in your
 467 current directory with filenames like `pg_loader-output.are.sql` and
 468 `pg_loader-output.sql` (which runs the previous SQL file). To continue with the
 469 previous example by processing our new `auth.are` file:
 470 +
 471 ----
 472 cd Open-ILS/src/extras/import/
 473 perl parallel_pg_loader.pl --auto are --order are auth.are
 474 ----
 475 +
 476 [TIP]
 477 To save time for very large batches of records, you could simply pipe the
 478 output of *marc2are.pl* directly into *parallel_pg_loader.pl*.
 479
 480  . Load the authority records from the SQL file that you generated in the
 481 last step into your Evergreen database using the psql tool. Assuming the
 482 default user name, host name, and database name for an Evergreen instance,
 483 that command looks like:
 484 +
 485 ----
 486 psql -U evergreen -h localhost -d evergreen -f pg_loader-output.sql
 487 ----
 488
 489 == Juvenile-to-adult batch script ==
 490
 491 The batch `juv_to_adult.srfsh` script is responsible for toggling a patron
 492 from juvenile to adult. It should be set up as a cron job.
 493
 494 This script changes patrons to adult when they reach the age value set in the
 495 library setting named "Juvenile Age Threshold" (`global.juvenile_age_threshold`).
 496 When no library setting value is present at a given patron's home library, the
 497 value passed in to the script will be used as a default.
 498
 499 == MARC Stream Importer ==
 500
 501 indexterm:[MARC records,importing,using the command line]
 502
 503 The MARC Stream Importer can import authority records or bibliographic records.
 504 A single running instance of the script can import either type of record, based
 505 on the record leader.
 506
 507 This support script has its own configuration file, _marc_stream_importer.conf_,
 508 which includes settings related to logs, ports, uses, and access control.
 509
 510 By default, _marc_stream_importer.pl_ will typically be located in the
 511 _/openils/bin_ directory. _marc_stream_importer.conf_ will typically be located
 512 in _/openils/conf_.
 513
 514 The importer is even more flexible than the staff client import, including the
 515 following options:
 516
 517  * _--bib-auto-overlay-exact_ and _--auth-auto-overlay-exact_: overlay/merge on
 518 exact 901c matches
 519  * _--bib-auto-overlay-1match_ and _--auth-auto-overlay-1match_: overlay/merge
 520 when exactly one match is found
 521  * _--bib-auto-overlay-best-match_ and _--auth-auto-overlay-best-match_:
 522 overlay/merge on best match
 523  * _--bib-import-no-match_ and _--auth-import-no-match_: import when no match
 524 is found
 525
 526 One advantage to using this tool instead of the staff client Import interface
 527 is that the MARC Stream Importer can load a group of files at once.
 528