Skip to content
Snippets Groups Projects
Commit 97d52d0b authored by Charlotte Hausman's avatar Charlotte Hausman
Browse files

update known issues documentation page

parent ad69c070
No related branches found
No related tags found
1 merge request!827update known issues documentation page
Pipeline #4518 passed
......@@ -2,6 +2,7 @@
### Before Deployment
* Ensure all changes made to the RC branch during testing have been added back into main.
* Create Release Notes
* Update <span style="color:cornflowerblue">dsoc-prod</span> Capo Properties
* Back up Production Database (If deploying in conjunction with AAT, the AAT procedure covers this step)
......@@ -27,4 +28,3 @@
in the workspaces/sbin/ area.
* Smoke test production
* Send 'All Clear' notification to <span style="color:cornflowerblue">ssa-announcements@nrao.edu</span>
* Merge release candidate branch back into main branch to pick up any fixes made during testing
......@@ -7,155 +7,36 @@ Bugs
System as a Whole
-----------------
Version 2.0.0
^^^^^^^^^^^^^
- Occasionally messages can get held by AMQP, preventing Capability Request updates and state machine transitions.
This seems to occur most frequently on the test system.
Messaging System
----------------
Version 2.0.0
^^^^^^^^^^^^^
- Occasionally messages can get held by AMQP. This seems to occur most frequently on the test system.
Capability System
-----------------
Version 2.0.0
^^^^^^^^^^^^^
- State transitions and actions can be impacted by the previously mentioned message queueing issue.
- The Auto-calibration system contains an error in the request submission endpoint. Incoming archive
ingestion events will auto-create new Calibration Capability Requests, but those requests will need to
be manually submitted from the active requests page. This will be fixed in Version 2.5.0.
Workflow System
---------------
No known issues
Notification System
-------------------
No known issues
Gripes
======
Docker
------
- Sidecar for visibility into what containers are running; docker logs
(can maybe use Prometheus built-in to gitlab) - `WS-425 <https://open-jira.nrao.edu/browse/WS-425>`__
Condor
------
- Get the data copy plugin from SCG → repo - `WS-415 <https://open-jira.nrao.edu/browse/WS-415>`__
- Update the wf_monitor to recognize other Condor status codes - `WS-413 <https://open-jira.nrao.edu/browse/WS-413>`__
Docs
----
- Setup for development page → update for docker containers
- Move that info into the installation page
- Update the README.md files to say something about what they're
attached to
- Integrate the README.md files into the docs, maybe the API docs
themselves
- `WS-428 <https://open-jira.nrao.edu/browse/WS-428>`__
Testing
-------
- Packages that need testing:
1) `workspaces.capability`
i) `schema`
ii) `capability_service` - Unfinished
iii) `execution_manager` - Essentially unimplemented (all tests are skipped)
iv) `capability_info` - Almost finished; needs 7 more tests
v) `queue_manager`
2) `workspaces.workflow`
i) `schema`
ii) `workflow_service` - Essentially unimplemented (all tests are skipped)
iii) `workflow_info`
3) `workspaces.notification`
i) `schema`
ii) `notification_service`
iii) `notification_service`
4) `workspaces.system`
i) `schema`
ii) `views`
iii) `notification_service`
- Fix the end-to-end tests that Nathan disabled because they are
hard-coded for the redirect to the request page
- Add schema migration to CI
- `WS-435 <https://open-jira.nrao.edu/browse/WS-435>`__
Database
--------
- Add permanent data from ~10 small projects to the CoreSampler for local testing
- Consider moving from json to jsonb datatype
- `WS-441 <https://open-jira.nrao.edu/browse/WS-441>`__
Pipeline
--------
- Update the end-to-end test container with new tests for all new pages/features
- Prevent `cleanup` stage from deleting tagged images when multiple pipelines are running; this issue causes `push` stage to fail
- Raise concurrent pipeline job limit (currently 6)
- Rule for canceling pipelines on a MR if a newer pipeline starts running
- "Testing on dev" flag that we can set to prevent pipelines (or maybe just deploy step?) from running on main (they'll just queue up and run when the flag is unset)
Code Tweaks
-----------
- `wf_monitor`: Support for more HTCondor event codes and support for them within the system
.. code-block:: python
# Enum example by Daniel
class HTCondorEvent(Enum):
def __init__(self, code: int, meaning: str, terminator: bool):
self.code, self.meaning, self.terminator = code, meaning, terminator
# then decoding it looks like HTCondorEvent[code] and you can ask questions like if HTCondorEvent[code].is_terminator: …
SUBMITTED(0, 'executing', False)
EXECUTING(1, 'executing', False)
...
TERMINATED(5, 'terminated', True)
- Hardcoded 48 GB of RAM in the calibration template; needs to use a
Capo profile
- See if Mustache can access Capo properties without much extra work
- `WS-447 <https://open-jira.nrao.edu/browse/WS-447>`__
- `metadata.json`: Rename fields to be more descriptive and accurate of what the values represent
- Modify code to be compliant with `pylint`
- Paves the way for a `pylint` pre-commit hook
- Use zope transactions for AMQP and REST
Dependencies and Overall Structure
----------------------------------
- Move implementations of services out of shared/workspaces and into the relevant services
- Separate interfaces for separate services into separate packages so that we can be sure
that the workflow service doesn't even have access to the capability interfaces
- Can REST API implementations of these interfaces be created?
- If so, can those REST API implementations become dependencies of e.g. the capability service?
- `List of Developer Identified Desired System Improvements <https://open-confluence.nrao.edu/display/SSA/Gripes>`__
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment